» Articles » PMID: 21436105

Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads Under Maximum Likelihood

Overview
Journal Syst Biol
Specialty Biology
Date 2011 Mar 26
PMID 21436105
Citations 204
Authors
Affiliations
Soon will be listed here.
Abstract

We present an evolutionary placement algorithm (EPA) and a Web server for the rapid assignment of sequence fragments (short reads) to edges of a given phylogenetic tree under the maximum-likelihood model. The accuracy of the algorithm is evaluated on several real-world data sets and compared with placement by pair-wise sequence comparison, using edit distances and BLAST. We introduce a slow and accurate as well as a fast and less accurate placement algorithm. For the slow algorithm, we develop additional heuristic techniques that yield almost the same run times as the fast version with only a small loss of accuracy. When those additional heuristics are employed, the run time of the more accurate algorithm is comparable with that of a simple BLAST search for data sets with a high number of short query sequences. Moreover, the accuracy of the EPA is significantly higher, in particular when the sample of taxa in the reference topology is sparse or inadequate. Our algorithm, which has been integrated into RAxML, therefore provides an equally fast but more accurate alternative to BLAST for tree-based inference of the evolutionary origin and composition of short sequence reads. We are also actively developing a Web server that offers a freely available service for computing read placements on trees using the EPA.

Citing Articles

Scalable method for exploring phylogenetic placement uncertainty with custom visualizations using and .

Chen M, Luo X, Xu S, Li L, Li J, Xie Z Imeta. 2025; 4(1):e269.

PMID: 40027482 PMC: 11865327. DOI: 10.1002/imt2.269.


Read Length Dominates Phylogenetic Placement Accuracy of Ancient DNA Reads.

Bettisworth B, Psonis N, Poulakakis N, Pavlidis P, Stamatakis A Mol Biol Evol. 2025; 42(2).

PMID: 39823473 PMC: 11839404. DOI: 10.1093/molbev/msaf006.


Comparative Analysis of Protist Communities in Oilsands Tailings Using Amplicon Sequencing and Metagenomics.

Zahonova K, Kaur H, Furgason C, Smirnova A, Dunfield P, Dacks J Environ Microbiol. 2025; 27(1):e70029.

PMID: 39797470 PMC: 11724239. DOI: 10.1111/1462-2920.70029.


Testing Phylogenetic Placement Accuracy of DNA Barcode Sequences on a Fish Backbone Tree: Implications of Backbone Tree Completeness and Species Representation.

Fernando M, Fu J, Adamowicz S Ecol Evol. 2025; 15(1):e70817.

PMID: 39781258 PMC: 11706799. DOI: 10.1002/ece3.70817.


Transcriptomic Data Reveal Divergent Paths of Chitinase Evolution Underlying Dietary Convergence in Anteaters and Pangolins.

Allio R, Teullet S, Lutgen D, Magdeleine A, Koual R, Tilak M Genome Biol Evol. 2025; 17(2).

PMID: 39780438 PMC: 11789784. DOI: 10.1093/gbe/evaf002.


References
1.
Munch K, Boomsma W, Huelsenbeck J, Willerslev E, Nielsen R . Statistical assignment of DNA sequences using Bayesian phylogenetics. Syst Biol. 2008; 57(5):750-7. DOI: 10.1080/10635150802422316. View

2.
Stamatakis A, Hoover P, Rougemont J . A rapid bootstrap algorithm for the RAxML Web servers. Syst Biol. 2008; 57(5):758-71. DOI: 10.1080/10635150802429642. View

3.
Felsenstein J . Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981; 17(6):368-76. DOI: 10.1007/BF01734359. View

4.
Pruitt K, Tatusova T, Maglott D . NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2006; 35(Database issue):D61-5. PMC: 1716718. DOI: 10.1093/nar/gkl842. View

5.
Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar . ARB: a software environment for sequence data. Nucleic Acids Res. 2004; 32(4):1363-71. PMC: 390282. DOI: 10.1093/nar/gkh293. View