» Articles » PMID: 27252899

Applications of Species Accumulation Curves in Large-scale Biological Data Analysis

Overview
Journal Quant Biol
Publisher Wiley
Date 2016 Jun 3
PMID 27252899
Citations 18
Authors
Affiliations
Soon will be listed here.
Abstract

The species accumulation curve, or collector's curve, of a population gives the expected number of observed species or distinct classes as a function of sampling effort. Species accumulation curves allow researchers to assess and compare diversity across populations or to evaluate the benefits of additional sampling. Traditional applications have focused on ecological populations but emerging large-scale applications, for example in DNA sequencing, are orders of magnitude larger and present new challenges. We developed a method to estimate accumulation curves for predicting the complexity of DNA sequencing libraries. This method uses rational function approximations to a classical non-parametric empirical Bayes estimator due to Good and Toulmin [Biometrika, 1956, 43, 45-63]. Here we demonstrate how the same approach can be highly effective in other large-scale applications involving biological data sets. These include estimating microbial species richness, immune repertoire size, and -mer diversity for genome assembly applications. We show how the method can be modified to address populations containing an effectively infinite number of species where saturation cannot practically be attained. We also introduce a flexible suite of tools implemented as an R package that make these methods broadly accessible.

Citing Articles

Generalization of the sci-L3 method to achieve high-throughput linear amplification for replication template strand sequencing, genome conformation capture, and the joint profiling of RNA and chromatin accessibility.

Chovanec P, Yin Y Nucleic Acids Res. 2025; 53(4).

PMID: 39997216 PMC: 11851118. DOI: 10.1093/nar/gkaf101.


Dormancy-inducing 3D engineered matrix uncovers mechanosensitive and drug-protective FHL2-p21 signaling axis.

Bakhshandeh S, Heras U, Taieb H, Varadarajan A, Lissek S, Hucker S Sci Adv. 2024; 10(45):eadr3997.

PMID: 39504377 PMC: 11540038. DOI: 10.1126/sciadv.adr3997.


DNA barcodes provide insights into the diversity and biogeography of the non-biting midge (Diptera, Chironomidae) in South America.

da Silva F, Pinho L, Stur E, Nihei S, Ekrem T Ecol Evol. 2023; 13(10):e10602.

PMID: 37841227 PMC: 10568203. DOI: 10.1002/ece3.10602.


Woody species diversity and regeneration status of Sub-Alpine forest of Mount Adama exclosure site, Northwestern highlands of Ethiopia.

Mengistu D, Bekele D, Gela A, Meshesha D, Getahun M Heliyon. 2023; 9(6):e16473.

PMID: 37251442 PMC: 10220367. DOI: 10.1016/j.heliyon.2023.e16473.


The use of ecological analytical tools as an unconventional approach for untargeted metabolomics data analysis: the case of Cecropia obtusifolia and its adaptive responses to nitrate starvation.

Cadena-Zamudio J, Monribot-Villanueva J, Perez-Torres C, Alatorre-Cobos F, Jimenez-Moraila B, Guerrero-Analco J Funct Integr Genomics. 2022; 22(6):1467-1493.

PMID: 36199002 PMC: 9701659. DOI: 10.1007/s10142-022-00904-1.


References
1.
Compeau P, Pevzner P, Tesler G . How to apply de Bruijn graphs to genome assembly. Nat Biotechnol. 2011; 29(11):987-91. PMC: 5531759. DOI: 10.1038/nbt.2023. View

2.
Kroes I, Lepp P, Relman D . Bacterial diversity within the human subgingival crevice. Proc Natl Acad Sci U S A. 1999; 96(25):14547-52. PMC: 24473. DOI: 10.1073/pnas.96.25.14547. View

3.
Zerbino D, Birney E . Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008; 18(5):821-9. PMC: 2336801. DOI: 10.1101/gr.074492.107. View

4.
Yatsunenko T, Rey F, Manary M, Trehan I, Dominguez-Bello M, Contreras M . Human gut microbiome viewed across age and geography. Nature. 2012; 486(7402):222-7. PMC: 3376388. DOI: 10.1038/nature11053. View

5.
Daley T, Smith A . Predicting the molecular complexity of sequencing libraries. Nat Methods. 2013; 10(4):325-7. PMC: 3612374. DOI: 10.1038/nmeth.2375. View