» Articles » PMID: 20008478

Estimating DNA Coverage and Abundance in Metagenomes Using a Gamma Approximation

Overview
Journal Bioinformatics
Specialty Biology
Date 2009 Dec 17
PMID 20008478
Citations 13
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Shotgun sequencing generates large numbers of short DNA reads from either an isolated organism or, in the case of metagenomics projects, from the aggregate genome of a microbial community. These reads are then assembled based on overlapping sequences into larger, contiguous sequences (contigs). The feasibility of assembly and the coverage achieved (reads per nucleotide or distinct sequence of nucleotides) depend on several factors: the number of reads sequenced, the read length and the relative abundances of their source genomes in the microbial community. A low coverage suggests that most of the genomic DNA in the sample has not been sequenced, but it is often difficult to estimate either the extent of the uncaptured diversity or the amount of additional sequencing that would be most efficacious. In this work, we regard a metagenome as a population of DNA fragments (bins), each of which may be covered by one or more reads. We employ a gamma distribution to model this bin population due to its flexibility and ease of use. When a gamma approximation can be found that adequately fits the data, we may estimate the number of bins that were not sequenced and that could potentially be revealed by additional sequencing. We evaluated the performance of this model using simulated metagenomes and demonstrate its applicability on three recent metagenomic datasets.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Rapid and Comprehensive Identification of Nontuberculous Mycobacteria.

Matsumoto Y, Nakamura S Methods Mol Biol. 2023; 2632:247-255.

PMID: 36781733 DOI: 10.1007/978-1-0716-2996-3_17.


RNA Viruses Linked to Eukaryotic Hosts in Thawed Permafrost.

Wu R, Bottos E, Danna V, Stegen J, Jansson J, Davison M mSystems. 2022; 7(6):e0058222.

PMID: 36453933 PMC: 9765123. DOI: 10.1128/msystems.00582-22.


Estimating the Optimum Coverage and Quality of Amplicon Sequencing With Taylor's Power Law Extensions.

Ma Z Front Bioeng Biotechnol. 2020; 8:372.

PMID: 32500062 PMC: 7242763. DOI: 10.3389/fbioe.2020.00372.


Estimating the total genome length of a metagenomic sample using k-mers.

Hua K, Zhang X BMC Genomics. 2019; 20(Suppl 2):183.

PMID: 30967110 PMC: 6456951. DOI: 10.1186/s12864-019-5467-x.


Nonpareil 3: Fast Estimation of Metagenomic Coverage and Sequence Diversity.

Rodriguez-R L, Gunturu S, Tiedje J, Cole J, Konstantinidis K mSystems. 2018; 3(3).

PMID: 29657970 PMC: 5893860. DOI: 10.1128/mSystems.00039-18.


References
1.
Chao A, Bunge J . Estimating the number of species in a stochastic abundance model. Biometrics. 2002; 58(3):531-9. DOI: 10.1111/j.0006-341x.2002.00531.x. View

2.
Schloss P, Handelsman J . Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl Environ Microbiol. 2005; 71(3):1501-6. PMC: 1065144. DOI: 10.1128/AEM.71.3.1501-1506.2005. View

3.
Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E, McHardy A . Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods. 2007; 4(6):495-500. DOI: 10.1038/nmeth1043. View

4.
Angly F, Rodriguez-Brito B, Bangor D, McNairnie P, Breitbart M, Salamon P . PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information. BMC Bioinformatics. 2005; 6:41. PMC: 555943. DOI: 10.1186/1471-2105-6-41. View

5.
Lander E, Waterman M . Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics. 1988; 2(3):231-9. DOI: 10.1016/0888-7543(88)90007-9. View