Skip to main content

The NMDC Data Portal: An Integrated Multi-omics Microbiome Data Resource

Making microbiome data FAIR—Findable, Accessible, Interoperable, and Reusable 

outstretched hand holds multicolored objects
The National Microbiome Data Collaborative supports multi-omics data exploration across diverse microbiomes. (Composite image by Shannon Colson | Pacific Northwest National Laboratory)

The Science  

Multi-omics technologies have produced an abundance of data from microbial samples. However, the samples and data may be analyzed and processed in different ways, making it difficult for researchers to use datasets from other laboratories. Additionally, data may be dispersed across different repositories, making it difficult to find and access. Researchers created the National Microbiome Data Collaborative (NMDC) Data Portal to standardize multi-omics microbiome data and make it accessible through an easy-to-use interface. The data portal hosts 10.2 terabytes of data from metagenome, metatranscriptome, metaproteome, metabolome, and natural organic matter characterizations generated at two Department of Energy (DOE) Office of Science User Facilities: the Environmental Molecular Sciences Laboratory (EMSL) and the Joint Genome Institute (JGI). 

The Impact 

The NMDC Data Portal brings FAIR principles—findability, accessibility, interoperability, and reusability—to microbiome research by providing a single location where researchers can search for integrated and standardized microbiome data. It also enables researchers to explore the data from multi-omics analyses performed on a sample in various ways:  by functional annotation, environment, or analysis type. The data portal also lowers the barrier to entry into microbiome research by allowing researchers to directly query data through the portal. With these features, the data portal can accelerate microbiome research, such as efforts to understand how microbiomes interact with their environments and how they can be harnessed for sustainable bioenergy solutions. 

Summary 

Data available in the National Microbiome Data Collaborative (NMDC) Data Portal have been carefully curated and represented by a multi-institutional team in a data schema that defines studies, samples, data objects, and the relationships among them. A set of standard terms have been implemented to describe the microbiome samples, and standardized open-source bioinformatics workflows were developed to process raw multi-omics data (e.g., metagenome, metatranscriptome, metaproteome, metabolome, and natural organic matter data) to generate interoperable and reusable annotated data products. The data schema and underlying organization of the data provide users the ability to refine and subset NMDC data through faceted search and interactive visualizations. They have established a distributed data infrastructure that supports coordination of sample management and data hosting protocols between the Environmental Molecular Sciences Laboratory (EMSL) and the Joint Genome Institute (JGI), two Department of Energy (DOE) Office of Science User Facilities, thereby avoiding data duplication. 

Data currently available in the NMDC Data Portal are focused on multi-omics data from environmental microbiomes generated by joint JGI-EMSL projects under the Facilities Integrating Collaborations for User Science (FICUS) program. The team plans to expand the data portal to include additional JGI- and EMSL-generated data, as well as user submissions.

The search portal is hosted on the Kubernetes cluster managed by the National Energy Research Scientific Computing Center

Contact 

Lee Ann McCue 
Environmental Molecular Sciences Laboratory 
leeann.mccue@pnnl.gov

Funding 

Development of the NMDC data portal is supported by the Genomic Science Program in the Department of Energy Office of Science, Biological and Environmental Research program. 

Publication

EA Eloe-Fadrosh, et. al. 2021. “The National Microbiome Data Collaborative Data Portal: an integrated multi-omics microbiome data resource,” Nucleic Acids Research, 50:D828-D836.  DOI: 10.1093/nar/gkab990