Terabase-scale Metagenome Coassembly with MetaHipMer

Overview

Journal Sci Rep

Specialty Science

Date 2020 Jul 3

PMID 32612216

Citations 13

Authors

Steven Hofmeyr

Rob Egan

Evangelos Georganas

Alex C Copeland

Robert Riley

Alicia Clum

Emiley Eloe-Fadrosh

Simon Roux

Eugene Goltsman

Aydin Buluc

Daniel Rokhsar

Leonid Oliker

Katherine Yelick

Affiliations

Soon will be listed here.

Abstract

Metagenome sequence datasets can contain terabytes of reads, too many to be coassembled together on a single shared-memory computer; consequently, they have only been assembled sample by sample (multiassembly) and combining the results is challenging. We can now perform coassembly of the largest datasets using MetaHipMer, a metagenome assembler designed to run on supercomputers and large clusters of compute nodes. We have reported on the implementation of MetaHipMer previously; in this paper we focus on analyzing the impact of very large coassembly. In particular, we show that coassembly recovers a larger genome fraction than multiassembly and enables the discovery of more complete genomes, with lower error rates, whereas multiassembly recovers more dominant strain variation. Being able to coassemble a large dataset does not preclude one from multiassembly; rather, having a fast, scalable metagenome assembler enables a user to more easily perform coassembly and multiassembly, and assemble both abundant, high strain variation genomes, and low-abundance, rare genomes. We present several assemblies of terabyte datasets that could never be coassembled before, demonstrating MetaHipMer's scaling power. MetaHipMer is available for public use under an open source license and all datasets used in the paper are available for public download.

Citing Articles

GenomeOcean: An Efficient Genome Foundation Model Trained on Large-Scale Metagenomic Assemblies.

Zhou Z, Riley R, Kautsar S, Wu W, Egan R, Hofmeyr S bioRxiv. 2025; .

PMID: 39975405 PMC: 11838515. DOI: 10.1101/2025.01.30.635558.

Metagenome-assembled-genomes recovered from the Arctic drift expedition MOSAiC.

Boulton W, Salamov A, Grigoriev I, Calhoun S, Labutti K, Riley R Sci Data. 2025; 12(1):204.

PMID: 39904998 PMC: 11794607. DOI: 10.1038/s41597-025-04525-8.

A metagenomic perspective on the microbial prokaryotic genome census.

Wu D, Seshadri R, Kyrpides N, Ivanova N Sci Adv. 2025; 11(3):eadq2166.

PMID: 39823337 PMC: 11740963. DOI: 10.1126/sciadv.adq2166.

Coassembly and binning of a twenty-year metagenomic time-series from Lake Mendota.

Oliver T, Varghese N, Roux S, Schulz F, Huntemann M, Clum A Sci Data. 2024; 11(1):966.

PMID: 39231974 PMC: 11374980. DOI: 10.1038/s41597-024-03826-8.

From soil to sequence: filling the critical gap in genome-resolved metagenomics is essential to the future of soil microbial ecology.

Anthony W, Allison S, Broderick C, Rodriguez L, Clum A, Cross H Environ Microbiome. 2024; 19(1):56.

PMID: 39095861 PMC: 11295382. DOI: 10.1186/s40793-024-00599-w.

References

Wendl M, Kota K, Weinstock G, Mitreva M . Coverage theories for metagenomic DNA sequencing based on a generalization of Stevens' theorem. J Math Biol. 2012; 67(5):1141-61. PMC: 3795925. DOI: 10.1007/s00285-012-0586-x. View

Stanhope S . Occupancy modeling, maximum contig size probabilities and designing metagenomics experiments. PLoS One. 2010; 5(7):e11652. PMC: 2912229. DOI: 10.1371/journal.pone.0011652. View

Deng X, Naccache S, Ng T, Federman S, Li L, Chiu C . An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data. Nucleic Acids Res. 2015; 43(7):e46. PMC: 4402509. DOI: 10.1093/nar/gkv002. View

Nurk S, Meleshko D, Korobeynikov A, Pevzner P . metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017; 27(5):824-834. PMC: 5411777. DOI: 10.1101/gr.213959.116. View

Peng Y, Leung H, Yiu S, Chin F . IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012; 28(11):1420-8. DOI: 10.1093/bioinformatics/bts174. View

Wang Z, Wang Y, Fuhrman J, Sun F, Zhu S . Assessment of metagenomic assemblers based on hybrid reads of real and simulated metagenomic sequences. Brief Bioinform. 2019; 21(3):777-790. PMC: 7299307. DOI: 10.1093/bib/bbz025. View

Vollmers J, Wiegand S, Kaster A . Comparing and Evaluating Metagenome Assembly Tools from a Microbiologist's Perspective - Not Only Size Matters!. PLoS One. 2017; 12(1):e0169662. PMC: 5242441. DOI: 10.1371/journal.pone.0169662. View

Scholz M, Lo C, Chain P . Improved assemblies using a source-agnostic pipeline for MetaGenomic Assembly by Merging (MeGAMerge) of contigs. Sci Rep. 2014; 4:6480. PMC: 4180827. DOI: 10.1038/srep06480. View

Mikheenko A, Saveliev V, Gurevich A . MetaQUAST: evaluation of metagenome assemblies. Bioinformatics. 2015; 32(7):1088-90. DOI: 10.1093/bioinformatics/btv697. View

10.

Bowers R, Kyrpides N, Stepanauskas R, Harmon-Smith M, Doud D, Reddy T . Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol. 2017; 35(8):725-731. PMC: 6436528. DOI: 10.1038/nbt.3893. View

11.

Singer E, Andreopoulos B, Bowers R, Lee J, Deshpande S, Chiniquy J . Next generation sequencing data of a defined microbial mock community. Sci Data. 2016; 3:160081. PMC: 5037974. DOI: 10.1038/sdata.2016.81. View

12.

Prjibelski A, Vasilinetc I, Bankevich A, Gurevich A, Krivosheeva T, Nurk S . ExSPAnder: a universal repeat resolver for DNA fragment assembly. Bioinformatics. 2014; 30(12):i293-301. PMC: 4058921. DOI: 10.1093/bioinformatics/btu266. View

13.

Kang D, Li F, Kirton E, Thomas A, Egan R, An H . MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ. 2019; 7:e7359. PMC: 6662567. DOI: 10.7717/peerj.7359. View

14.

Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, Droge J . Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software. Nat Methods. 2017; 14(11):1063-1071. PMC: 5903868. DOI: 10.1038/nmeth.4458. View

15.

Aguirre de Carcer D, Angly F, Alcami A . Evaluation of viral genome assembly and diversity estimation in deep metagenomes. BMC Genomics. 2014; 15:989. PMC: 4247695. DOI: 10.1186/1471-2164-15-989. View

16.

Hess M, Sczyrba A, Egan R, Kim T, Chokhawala H, Schroth G . Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science. 2011; 331(6016):463-7. DOI: 10.1126/science.1200387. View

17.

Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E, McHardy A . Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods. 2007; 4(6):495-500. DOI: 10.1038/nmeth1043. View

18.

Fritz A, Hofmann P, Majda S, Dahms E, Droge J, Fiedler J . CAMISIM: simulating metagenomes and microbial communities. Microbiome. 2019; 7(1):17. PMC: 6368784. DOI: 10.1186/s40168-019-0633-6. View

19.

Olm M, Brown C, Brooks B, Banfield J . dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 2017; 11(12):2864-2868. PMC: 5702732. DOI: 10.1038/ismej.2017.126. View

20.

Howe A, Jansson J, Malfatti S, Tringe S, Tiedje J, Brown C . Tackling soil diversity with the assembly of large, complex metagenomes. Proc Natl Acad Sci U S A. 2014; 111(13):4904-9. PMC: 3977251. DOI: 10.1073/pnas.1402564111. View