» Articles » PMID: 29562347

GraftM: a Tool for Scalable, Phylogenetically Informed Classification of Genes Within Metagenomes

Overview
Specialty Biochemistry
Date 2018 Mar 22
PMID 29562347
Citations 70
Authors
Affiliations
Soon will be listed here.
Abstract

Large-scale metagenomic datasets enable the recovery of hundreds of population genomes from environmental samples. However, these genomes do not typically represent the full diversity of complex microbial communities. Gene-centric approaches can be used to gain a comprehensive view of diversity by examining each read independently, but traditional pairwise comparison approaches typically over-classify taxonomy and scale poorly with increasing metagenome and database sizes. Here we introduce GraftM, a tool that uses gene specific packages to rapidly identify gene families in metagenomic data using hidden Markov models (HMMs) or DIAMOND databases, and classifies these sequences using placement into pre-constructed gene trees. The speed and accuracy of GraftM was benchmarked with in silico and in vitro mock communities using taxonomic markers, and was found to have higher accuracy at the family level with a processing time 2.0-3.7× faster than currently available software. Exploration of a wetland metagenome using 16S rRNA- and methyl-coenzyme M reductase (McrA)-specific gpkgs revealed taxonomic and functional shifts across a depth gradient. Analysis of the NCBI nr database using the McrA gpkg allowed the detection of novel sequences belonging to phylum-level lineages. A growing collection of gpkgs is available online (https://github.com/geronimp/graftM_gpkgs), where curated packages can be uploaded and exchanged.

Citing Articles

Methane trapping in permafrost soils: a biogeochemical dataset across Alaskan boreal-Arctic gradient.

Kim J, Kim Y, Nam S, Jung J, Kim Y, Hwang J Sci Data. 2025; 12(1):110.

PMID: 39833284 PMC: 11747616. DOI: 10.1038/s41597-025-04463-5.


Global niche partitioning of purine and pyrimidine cross-feeding among ocean microbes.

Braakman R, Satinsky B, OKeefe T, Longnecker K, Hogle S, Becker J Sci Adv. 2025; 11(1):eadp1949.

PMID: 39752493 PMC: 11698098. DOI: 10.1126/sciadv.adp1949.


Global metagenomic survey identifies sewage-derived hgcAB microorganisms as key contributors to riverine methylmercury production.

Xia J, Yuan Z, Jiang F Nat Commun. 2024; 15(1):9262.

PMID: 39461941 PMC: 11513008. DOI: 10.1038/s41467-024-53479-9.


Microbiome-metabolite linkages drive greenhouse gas dynamics over a permafrost thaw gradient.

Freire-Zapata V, Holland-Moritz H, Cronin D, Aroney S, Smith D, Wilson R Nat Microbiol. 2024; 9(11):2892-2908.

PMID: 39354152 PMC: 11522005. DOI: 10.1038/s41564-024-01800-z.


Phylogenetic proximity drives temporal succession of marine giant viruses in a five-year metagenomic time-series.

Laperriere S, Minch B, Weissman J, Hou S, Yeh Y, Ignacio-Espinoza J bioRxiv. 2024; .

PMID: 39185240 PMC: 11343133. DOI: 10.1101/2024.08.12.607631.


References
1.
Haft D, DiCuccio M, Badretdin A, Brover V, Chetvernin V, ONeill K . RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Res. 2017; 46(D1):D851-D860. PMC: 5753331. DOI: 10.1093/nar/gkx1068. View

2.
Mirarab S, Nguyen N, Warnow T . SEPP: SATé-enabled phylogenetic placement. Pac Symp Biocomput. 2011; :247-58. DOI: 10.1142/9789814366496_0024. View

3.
Darling A, Jospin G, Lowe E, Matsen 4th F, Bik H, Eisen J . PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ. 2014; 2:e243. PMC: 3897386. DOI: 10.7717/peerj.243. View

4.
Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar . ARB: a software environment for sequence data. Nucleic Acids Res. 2004; 32(4):1363-71. PMC: 390282. DOI: 10.1093/nar/gkh293. View

5.
KROGH A, Brown M, Mian I, Sjolander K, Haussler D . Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994; 235(5):1501-31. DOI: 10.1006/jmbi.1994.1104. View