Skip to main content
Science Areas
Environmental Transformations and Interactions

Day 1: 2022 EMSL Summer School—Metadata

Session digs into the soil microbiome

Corydon Ireland |
A pen, laptop computer, and monitor with screen text [Soils Exposed! July 18-22]

Day 1 of EMSL Summer School focused on the National Microbiome Data Collaborative and EMSL's Molecular Observation Network. (Photo by Sadiya Maxamhud)

Soil is the world beneath our feet.

It’s a vertically variant mix of liquids, gases, chemicals, and living organisms. Soil occupies a complex and mysterious universe framed largely in a matrix of sand, silt, and clay. The vast stores of carbon it sequesters are vital to mitigating climate change and a key uncertainty in models of climate change.

To untangle all this complexity, it is important to collect samples that are coherently formatted and tell us about the chemical and biological complexity of soil. How do we analyze, organize, archive, and share data arising from samples in an integrated way?

The latest ways to do that form the curriculum of a five-day summer school called “Soils Exposed!” from July 18 –  22, 2022. This online event has drawn both early-career and veteran researchers to its morning public sessions and invitation-only afternoon tutorials.

“Soils Exposed” is hosted by the Environmental Molecular Sciences Laboratory (EMSL), a Department of Energy (DOE), Office of Science, user facility at Pacific Northwest National Laboratory (PNNL) in Richland, Washington. Some of the instruments there help researchers peer into the physical, chemical, and genetic nature of soil samples.

Think of a DOE user facility as a free-of-charge laboratory with experts on a given subject and related top instruments—one that is open to scientists whose peer-reviewed projects are awarded for months or years of research access.

Choosing the Right Data Platforms

Meanwhile, think of soil as a complex and nonlinear universe on what is called Earth’s pedosphere―where our feet go. On and into soil, we farm, build, and mine. Into and from soil, climate-influencing gases go.

Soil is biotic. It hosts water-film inhabitants like bacteria, viruses, yeasts, and nematodes. In numbers, a tablespoon of soil contains more organisms than the number of people on Earth.

Soil is abiotic. Its components include worn rock containing trace elements like cobalt and nickel.

Soil is also a three-dimensional water system. Gravity or capillary action transport water vapor and moisture through a subterrain of passages so tiny they are called pores.

In short, soil is very complex; it crosses many dimensions and many physical, biological, and chemical forms.

Without the right data architecture and sharing platforms, measuring the chemical, physical, and genetic patterns of soil will soon invite what Summer School co-organizer Montana Smith calls the “spaghetti-ness” of science.

Smith is an Earth scientist with EMSL’s Biogeochemical Transformations team. Joining her in co-organizing the conference is Darian Smercina, a postgraduate Pauling Fellow at PNNL who studies microbiome science. Pajau Vangay, a microbiome researcher at the Lawrence Berkeley National Laboratory and a lead in DOE’s National Microbiome Data Collaborative, assisted in leading Day 1 of Summer School.

Two of day one’s presenters, including Smith, delved into a promising pilot data platform called the National Microbiome Data Collaborative (NMDC)—a data science system being developed at Lawrence Berkeley National Laboratory.

More on NMDC in a minute. Soil scientists dig it.

MONet Paints a Soil-Data Picture

Smith and Smercina opened day one with a look ahead at event logistics and the week’s topics. The July 18 – 21 morning public presentations delve into metagenomics (Tuesday), soil organic matter (Wednesday), and statistics and visualization (Thursday). On Friday, July 22, Summer School’s competitively-selected students will also receive special tutorials on designing experiments and making soil-science proposals.

But the true grit (dirt?) of day one began with EMSL’s John Bargar and Maggie Bowman presenting sequentially on two fresh avenues for collecting, analyzing, and formatting soil data.

Bargar, science area leader for the user facility’s Environmental Transformations and Interactions science area, introduced EMSL’s Molecular Observation network (MONet). The mission is to observe soil processes across time and spatial scales (from field to region to continental scales). The focus is on the molecular forms of soils and soil organic matter.

“The subsurface is opaque,” and hard to measure and study, he says, which calls for more data from larger numbers of samples from more places—a huge undertaking!

With MONet as a first step at EMSL, Bargar adds, “we have to start somewhere.”

Bargar spelled out the grand challenges of soil science, which MONet shares. Included were opacity, along with the speed and transitory nature of biogeochemical activity within soils, the limits of single-point sampling, and the limited way Earth system models so far are able to represent soil processes. These limitations primarily include the mechanisms that control belowground pools of carbon.

MONet is looming into multiscale modeling approaches, rhizosphere sensors, and field sensors. But it was automated organic material analysis—grist for models—that Bargar chose to emphasize.

“Models can help,” says Bargar, “but we need data.”

MONet is also developing data collection partnerships with DOE scientific focus area projects and with the National Science Foundation’s National Ecological Observatory Network (NEON).

Behind all this is the big picture―the role of soils in climate change.

“Soils are incredibly important to climate change,” says Bargar. “They hold vast amounts of carbon―and there is an active exchange between soils and the atmosphere.”

A Pilot Project

Bargar introduced Bowman, who pointed attendees to the 1,000 Soils Research Pilot. Part of the larger MONet project, this EMSL effort is led by Bargar and co-investigator Emily Graham. The team envisions standardized soil sampling at the scale of the continental United States, linking the data to a high-throughput workflow for molecular analysis, and then streamlining database sharing and utilization.

The utility of the campaign: insight into molecule-level understanding of carbon cycles belowground.

Bowman helped pioneer the project’s cooler-size kit for standardizing biotic and abiotic soil cores at depths of 10, 20, and 30 centimeters.

Soil cores and field kits are then shipped back to EMSL. The biotic core is processed immediately and DNA samples are collected. EMSL's X-ray computed tomography is used to determine the physical structure of cores, providing information about the air, water, and soil in the core.

The standardized core collection results include an automated workflow system to extract data points from X-ray tomography, as well as from assessments of hydraulic properties, respiration, enzyme activity, and soil chemistry. The final metagenomics step involves another DOE user facility, the Joint Genome Institute (JGI) in California.

Soon, adds Bowman, “all this data will be published and openly available.”

Like MONet, the 1,000 Soils Research Pilot has an element of outreach to science partners. These include NEON, DOE’s Environmental System Science program, the Soil Carbon Solutions Center at Colorado State University, and AmeriFlux, a 34-site network for measuring ecosystem fluxes managed at Lawrence Berkeley National Laboratory.

“We can integrate our data models into others,” says Bowman. “Our big goal is to be able to get this data and share it.”

Bargar adds a sober note: “We’ve never needed this data more than now because of intensifying climate change.”

To Absorb Exploding Omics Data: NMDC

outstretched hand holds microbes
The National Microbiome Data Collaborative Data Portal was created to standardize multi-omics microbiome data and to allow researchers to directly query data through the portal. (Composite image by Shannon Colson | Pacific Northwest National Laboratory) 

Attendees during day one of “Soils Exposed!” received an introduction to NMDC.

The first was courtesy of JGI’s Emiley Eloe-Fadrosh

“The (omics) data is just exploding,” she says, including in the realm of soils science. “Data has outpaced data structure. How do we best support this growing volume of data?”

NMDC is one way, since one of its missions is to harmonize data across multiple systems. It addresses, to put it in Smith’s terms, the “spaghetti-ness” of science data, which gets quickly tangled without a coherent data platform.

Eloe-Fadrosh sees NMDC as a cohesive force, “a robust integration across samples and scales,” that is transitioning from its pilot phase, which started in 2019, to a production phase.

The NMDC “infrastructure backbone” is comprised of a submission portal, standardized workflows, and a data portal.

And its strengths include a system of partners, including the Genomics Standards Consortium and the American Society for Microbiology. Starting in the spring of 2022, NMDC established a partnership with KBase, the DOE’s systems biology knowledgebase.

Of course, continued work with EMSL and JGI will move NMDC into the realm of data analysis.

NMDC arose because of what Eloe-Fadrosh calls a “knowledge gap” in soils data, largely because of clashing standards and terminology. “What that results in (is that) the data is very noisy.”

For instance, she says the current Genomics Standards Consortium scheme for describing genome sequences, called MixS, requires 148 terms to describe a sample and records information on an Excel spreadsheet, which defies searchability. The NMSD submission portal, Eloe-Fadrosh says, is more forgiving.

Meanwhile, she adds, all NMDC data architecture is underlain by what are called FAIR data principles: Findability, Accessibility, Interoperability, and Reusability.

In this case, this means FAIR multi-omics microbiome data.

Next, Eloe-Fadrosh says NMDC goes beyond standardizing metadata to apply its scheme of uniformity and simplicity to improving workflows.

NMDC According to Montana Smith

Smith agrees with Eloe-Fadrosh that NMDC is a powerful step toward a data-science infrastructure for microbiome research.

“Generating multiomics data can be difficult and expensive,” she says, but NMDC offers a path to best practices in data curation and processing. “You can think of it as a card catalog.”

Her presentation focused on metadata, “the kind of stuff you usually put in notebooks,” Bargar says.

It’s vital for data to have context, says Smith, so it can be better preserved, discovered, and reused. “This is a much easier way to organize data in a large-scale experiment―though, we recognize that metadata is a daunting task and is often left to the end.”

But without such “operational data,” she adds, “we are dealing with the spaghetti-ness of science.”

Metadata includes details on the starting bio-sample, when and where it was collected, how it was prepared for analysis, and how data was generated.

Peatland Case Study

Presentation slide on left and presenter Spencer Roth on right. [Metabolomic responses to experimental warming]
Spencer Roth, a postdoctoral researcher with Oak Ridge National Laboratory, presented on the SPRUCE experiment, which looks at spruce and peatland responses under changing environments. (Photo by Genoa Blankenship | EMSL)

For all those afternoon tutorials during “Soils Exposed!,” students need a common data set. 

Chris Schadt and Spencer Roth of Oak Ridge National Laboratory stepped into the virtual Summer School to explain the source of the data students would be using all week. It’s from something called Spruce and Peatland Responses under Changing Environments (SPRUCE).

The soil heating experiment, which began in 2016 to simulate the effects of a changing climate, is unfolding on a site in the Marcell Experimental Forest in northern Minnesota at the southern edge of a vast boreal region.

Peatlands represent less than 3 percent of the Earth’s surface, Schadt says, but store 30 percent of the Earth’s soil carbon.

Their experiments deployed 10 warming chambers three meters (about 10 feet) into the peat, each with a different heating signature.

The data includes microbial communities and three years of metagenomic data.

In summary, Schadt says, increased warming is linked to increased levels of greenhouse gases in peatland soils.

Students, dig in.

Presentations from the first day of Summer School are available on EMSL LEARN.