Skip to main content
Science Areas
Environmental Transformations and Interactions

Day 2: 2022 EMSL Summer School—Know Your Metagenome

Know your metagenome: From research to data

Corydon Ireland |
Four people standing in a field with tall grasses. Umbrella and sun shade are in the foreground.

Day 2 of EMSL Summer Soil focused on soil metagenomics. Darian Smercina, pictured second from the left, is one of this year's Summer School organizers and a postgraduate Pauling Fellow at Pacific Northwest National Laboratory who studies microbiome science. This image was taken near Kellogg Biological Station in Hickory Corners, Michigan, in front of a switchgrass field where the Michigan State University research team conducted a 13C-CO2 pulse chase. (Photo provided by Darian Smercina)

Depending on topography, weather, rates of erosion, and other factors, it takes at least 500 years to form one inch of topsoil.

However, scientists are moving fast to study what soils are made of, what organisms inhabit them, and how they lose and gain the critical gases, water, and elements stored within them in such vast quantities.

Carbon dioxide and methane, for instance, are seeping from soil into the atmosphere at higher rates than ever because of environmental perturbations. Think of tropical deforestation, for example, which turns soils from a sink for methane into a source.

Such higher rates of loss, including loss of carbon as carbon dioxide, are tilting Earth’s warming trends upward and contributing to climate change.

Luckily, scientists have an increasing number of tools at hand for studying soils. Such tools for collecting, analyzing, archiving, and modeling soils are the subject of “Soils Exposed!”. The five-day summer school, running online July 18 – 22, is hosted by the Environmental Molecular Sciences Laboratory (EMSL), in partnership with the National Microbiome Data Collaborative (NMDC).

EMSL specializes in analytic technologies, including mass spectrometry, which characterize the makeup of soil samples (for instance) at the level of molecules and even the single cell. This science temple for analyzing the miniature is one of 28 user facilities funded by the Office of Science at the Department of Energy.

What is a Metagenome?

Bill Nelson speaks during the Soils Exposed 2022 Summer School
Bill Nelson, a scientist at Pacific Northwest National Laboratory, presented July 19 on environmental metagenomics. (Photo by Maegan Murray | EMSL)

Some of the tools for studying soils are designed to unlock genetic blueprints called genomes.

In turn, scientists study bulk samples in search of not a single genome, but a sample’s metagenome.

A metagenome is the raw genetic sum of all nucleotide sequences from all the organisms within a sample, which can be air, water, rock, animal tissue, plant matter, fungi, or soil.

Soil is a complex artifact made of all the other things in this list. It has a more complex metagenome than anything else on Earth.

From such environmental samples, researchers isolate genetic material and sequence it―that is, turn it into lines of code. After that, researchers analyze this sequence data to see what genes are present and from which organisms. They do this to pinpoint the structure and function of the genes within a sample and to even predict the influences that communities of organisms have on an ecological scale.

To study soil, for instance, at such ecological scales requires “untargeted” DNA sequencing, says presenter Bill Nelson, a scientist at Pacific Northwest National Laboratory (PNNL). Of day two’s three main presenters, he got the most airtime, beginning with one foundational talk on environmental metagenomics.

“Metagenomics is a survey technique,” which only “randomly” samples these largely microbial communities, he explains. “In general, the communities are going to be way more complex than you can sample.”

Bacterial Diversity, Discovered

Still, Nelson says metagenomics opens a window into soil diversity and dynamics, and it renders information on ecosystem function.

In the last few decades, “there has been a phenomenal increase in the discovery of bacterial diversity,” says Nelson. But then there is this cautionary note: 32 percent of the genome of the most studied organism on Earth, E. coli, has unknown function.

Nelson explained prominent analysis techniques, including Amplicon sequencing, which is “great for discovering and measuring microbes,” and shotgun metagenomics that “provides rich information but is expensive” and their strengths and limitations.

Nelson also outlined ongoing challenges in sample collection, including DNA extraction bias and database bias.

Whichever direction your research takes,” he says, “you need to be consistent in the techniques you use in (metagenomic) studies.”

Nelson presented a second talk detailing how to annotate metagenomic sequence data.

Annotation, with its steps to identify features and functions, is one way “to turn information into knowledge,” says Nelson, also a 20-year veteran of annotating metagenomes related to bacteria.

That means he is busy. A gram of soil contains an average of 10,000 species of bacteria.

What Holds Methane in and Lets it Out

Woman with glasses named Marie Kroeger presents on monitor. Laptop shows soil cores and [2022 Summer School Soils Exposed!]
Marie Kroeger, a scientist at Los Alamos National Laboratory, shared tips on analysis steps, taxonomies, and assembling population genomes. (Photo by Genoa Blankenship | EMSL)

Senior researchers, postdocs, and students listened in to “Soils Exposed!” along with those taking part in the school’s invitation-only afternoon tutorials. They are from all over North America, as well as places like India, Romania, and the Netherlands. All who attended were likely eager for a real-life example of using metagenomics in soil research.

Co-organizers Montana Smith and Darian Smercina introduced the first presenter, Marie Kroeger, who is a scientist at Los Alamos National Laboratory in New Mexico.

She offered up two examples of applying metagenomics. Both helped her unpack how converting tracts of rainforest in Brazil to cattle pastures converts soils from safe storage bins for methane (as hinted above) into soils that send this potent greenhouse gas into the atmosphere.

Read all about it in a 2020 paper Kroeger published in a Nature publication.

She and her team collected 10-centimer-long soil cores from three representative Brazilian landscapes: untouched rainforest, pastureland that was once tropical forest, and “secondary” rainforests that had (over 40 years or more) turned pasture back into forest.

Why use metagenomics? Because Kroeger and others arrived at their conclusions by studying 10 metagenomes to detect the fate of species in the microbial communities that cycle methane. In a pasture regime, they found that methane-producing species (methanogens) increased in abundance, diversity, and activity. That turned soils into smokestacks of methane emissions.

Kroeger delivered tips on analysis steps, taxonomies, assembling population genomes, relic DNA in soil samples (from dead microbes), tools that identify “active” soils, and biases in DNA kits.

Her plan next, she says, is to follow up on the seasonality of microbes and methanogenesis.

Standardized Workflows

Metagenome assembly
Los Alamos National Laboratory's Karen Davenport presented on metagenome assembly during Day 2 of the 2022 Summer School. (Photo by Maegan Murray | EMSL)

The Nelson and Krueger presentations will be available soon―online and in full. The same is true of all the talks delivered (or about to be delivered) at “Soils Exposed!”

Among those is a day two talk by Karen Davenport, a Los Alamos scientist who studies genome sequencing. The talk complemented the other day two presentations, all of which noted the necessity of workflows and pipelines in dealing with metagenomic analyses.

Presently, the reality of much metagenomics research creates “data sets that are not compatible,” she says. “We can’t inter-compare data from other studies.”

Davenport offers a still-developing set of workflows and metagenomics pipelines established by the NMDC.

NMDC hosts a collaborative data ecosystem for microbiome studies, including deeply intricate soil microbiomes.

There are five NMDC workflows for metagenomics. Raw sequence data rides along their open source EDGE bioinformatics platform—an interface that uses tools and algorithms that speed analysis.

The user also gets quality-control feedback, including a quality-control score for each sequence “read,” quality-control statistics, and suggestions on how to trim or filter reads.

EDGE is in a beta stage of development. For anyone interested in becoming a beta tester, Davenport says to email nmdc-edge@lanl.gov.

Still, she outlined many bioinformatics challenges, cautioning that “this is not a straightforward one-stop shop.”

Davenport recommends, as does Nelson, that researchers use “a combination of (metagenomics data processing and annotation) tools. There is no one perfect tool.”

She also restated a foundational and daunting fact for all soil researchers.

“The soil sample is the most complex,” Davenport says. “The more complex the metagenome, the less capable we are of assembling all the data.”

Watch the recorded presentations from EMSL Summer School on EMSL LEARN.