Skip to main content
Science Areas
Computing, Analytics, and Modeling
Environmental Transformations and Interactions

Day 4: 2022 EMSL Summer School—Statistics and Visualization

Using mass spectrometry data and visualization tools to answer diverse science questions

Genoa Blankenship |
Ruonan Wu speaks on screen during the virtual 2022 EMSL Summer School.

Ruonan Wu, a computational scientist at Pacific Northwest National Laboratory, presented at the final day of the 2022 EMSL Summer School public presentations on viruses and using data and visualization to explore science questions. (Photo by Dawn Stringer | EMSL)

Analyzing data makes Ruonan Wu feel like a detective.

Wu, a computational scientist at Pacific Northwest National Laboratory (PNNL), uses data analysis tools to identify scientific patterns and to reveal hidden data. After learning about the “secrets” behind data, Wu uses visualization techniques to communicate a compelling science story to the research community.

“The data always serves as a pool where you can retrieve information to answer your own questions,” says Wu.

On the final day of the 2022 EMSL Summer School: Soils Exposed! virtual event, Wu shared one of her science stories—research detecting viral signals from metagenomes—and how she analyzed viruses from a metagenomic point of view.

“Ruonan explored a topic we hadn't heard about yet—viruses,” says Darian Smercina, a Pauling Fellow and PNNL soil microbial ecologist. “Viruses are so important in soil. Ruonan is using metagenomic techniques to study them. Her work shows some really amazing ways to visualize complex metagenomic data, but also provides some cautionary insight about how to carefully interpret results and not misinterpret findings.”

Smercina and Montana Smith, an Earth scientist with the Department of Energy’s Environmental Molecular Sciences Laboratory (EMSL), co-organized this year’s Summer School that took place July 18-22. Summer School, co-sponsored by the National Microbiome Data Collaborative, included four days of public presentations on metadata, soil organic matter, and metagenomics. Additionally, two dozen students, selected through an admission process, attended private tutorials over the course of five days.

Tools for Analysis

On day four, talks turned to available data and visualization tools for exploring data generated from Fourier transform mass spectrometry (FT-MS) instruments.

The presentations were led by a group of PNNL data scientists —David Degnan, Damon Leach, Daniel Claborne, and Natalie Winans.

The group continued to discuss CoreMS, a framework addressed on day three by EMSL speakers Yuri Corilo, a computational scientist, and Will Kew, a chemist. CoreMS is a mass spectrometry framework for small molecule analysis.

Winans demonstrated how to convert CoreMS-compound identified mass spectrometry data to a filtering and visualization tool called the FT-MS R Exploratory Data Analysis tool, or FREDA. This tool allows users to upload data from FTICR-MS instruments and analyze data.

Claborne demonstrated how to process data in FREDA and export it into objects that can undergo differential abundance statistics with the pmartR package. The pmartR section of the session was taught by Leach and Degnan. Users had the opportunity to use FREDA and pmartR to filter, normalize, and run statistical analysis on datasets to learn how to use these tools for answering specific research questions and hypotheses.

Metagenomic Data and Methane Production

Four people appear onscreen during a presentation.
Susannah Tringe, division director of Environmental Genomics and Systems Biology at Lawrence Berkeley National Laboratory (Berkeley Lab), spoke on how metagenomic data and metabolomic data answered questions about the biological controls in methane production in wetlands. (Photo by Genoa Blankenship | EMSL)

Susannah Tringe, the division director of Environmental Genomics and Systems Biology at Lawrence Berkeley National Laboratory (Berkeley Lab), demonstrated how important data tools are for research. She presented her research, which explored microbial drivers of methane emissions in the San Francisco Bay wetlands.

Wetlands represent 9 percent of global land area. However, carbon stored by wetlands accounts for 35 percent of all terrestrial carbon, notes Tringe, who is also the microbial systems group lead for the Joint Genome Institute user facility.

She discussed research involving carbon cycling, wetland restoration, methane, and microbial communities in the Delta and San Francisco Bay. The South Bay Salt Pond Restoration Project is focused on restoring more than 15,000 acres of former industrial salt ponds.

Tringe noted how sequence-based methods revealed genes and organisms associated with differences in methane flux. She also addressed how metagenomics suggested possible mechanisms for unexpectedly high methane production from unrestored salt ponds.

“The team has discovered that site, salinity, biogeochemistry, and microbial activities are important to consider when planning wetland restoration efforts and predicting greenhouse gas emissions,” says Tringe.

The team is working with the Department of Energy’s Systems Biology Knowledgebase (KBase) to build community metabolic models. These models will be compared with metabolomics data from EMSL to identify methanogenesis substrates. So far, the research has led to the detection of 46 metabolites.

“Susannah's talk shows us a unique system—wetlands/salt ponds—and works to integrate metagenomic data and natural organic matter (e.g. metabolomic data) with biogeochemical flux rates to answer key questions about the biological controls on methane production in wetlands,” says Smercina.

Watch recordings from day four presentations on EMSL LEARN.