» Articles » PMID: 20529898

Time and Memory Efficient Likelihood-based Tree Searches on Phylogenomic Alignments with Missing Data

Overview
Journal Bioinformatics
Specialty Biology
Date 2010 Jun 10
PMID 20529898
Citations 55
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: The current molecular data explosion poses new challenges for large-scale phylogenomic analyses that can comprise hundreds or even thousands of genes. A property that characterizes phylogenomic datasets is that they tend to be gappy, i.e. can contain taxa with (many and disparate) missing genes. In current phylogenomic analyses, this type of alignment gappyness that is induced by missing data frequently exceeds 90%. We present and implement a generally applicable mechanism that allows for reducing memory footprints of likelihood-based [maximum likelihood (ML) or Bayesian] phylogenomic analyses proportional to the amount of missing data in the alignment. We also introduce a set of algorithmic rules to efficiently conduct tree searches via subtree pruning and re-grafting moves using this mechanism.

Results: On a large phylogenomic DNA dataset with 2177 taxa, 68 genes and a gappyness of 90%, we achieve a memory footprint reduction from 9 GB down to 1 GB, a speedup for optimizing ML model parameters of 11, and accelerate the Subtree Pruning Regrafting tree search phase by factor 16. Thus, our approach can be deployed to improve efficiency for the two most important resources, CPU time and memory, by up to one order of magnitude.

Availability: Current open-source version of RAxML v7.2.6 available at http://wwwkramer.in.tum.de/exelixis/software.html.

Citing Articles

Proteomics analysis revealed the activation and suppression of different host defense components challenged with mango leaf spot pathogen Alternaria alternata.

Xie X, Yang Z, Li D, Liu Z, Li X, Zhu Z BMC Plant Biol. 2025; 25(1):227.

PMID: 39972448 PMC: 11837451. DOI: 10.1186/s12870-025-06250-1.


Terraces in species tree inference from gene trees.

Habib M, Roy K, Hasan S, Rahman A, Bayzid M BMC Ecol Evol. 2024; 24(1):135.

PMID: 39497030 PMC: 11533290. DOI: 10.1186/s12862-024-02309-z.


Gentrius: Generating Trees Compatible With a Set of Unrooted Subtrees and its Application to Phylogenetic Terraces.

Chernomor O, Elgert C, von Haeseler A Mol Biol Evol. 2024; 41(11).

PMID: 39431557 PMC: 11536181. DOI: 10.1093/molbev/msae219.


Multi-gene phylogenetic analyses revealed two novel species and one new record of (Pleosporales, Dictyosporiaceae) from China.

Zhang W, Xu G, Liu Y, Gao Y, Song H, Hu H MycoKeys. 2024; 106:117-132.

PMID: 38948914 PMC: 11211656. DOI: 10.3897/mycokeys.106.123279.


A new species of (Sordariomycetes, Chaetosphaeriales, Chaetosphaeriaceae) from freshwater habitats in China.

Yan X, Huang J, Song H, Gao Y, Hu H, Zhai Z Biodivers Data J. 2024; 11:e97439.

PMID: 38327284 PMC: 10848523. DOI: 10.3897/BDJ.11.e97439.


References
1.
Felsenstein J . Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981; 17(6):368-76. DOI: 10.1007/BF01734359. View

2.
Stamatakis A, Ott M . Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures. Philos Trans R Soc Lond B Biol Sci. 2008; 363(1512):3977-84. PMC: 2607410. DOI: 10.1098/rstb.2008.0163. View

3.
Goldman N, Yang Z . Introduction. Statistical and computational challenges in molecular phylogenetics and evolution. Philos Trans R Soc Lond B Biol Sci. 2008; 363(1512):3889-92. PMC: 2590901. DOI: 10.1098/rstb.2008.0182. View

4.
Suchard M, Rambaut A . Many-core algorithms for statistical phylogenetics. Bioinformatics. 2009; 25(11):1370-6. PMC: 2682525. DOI: 10.1093/bioinformatics/btp244. View

5.
Yang Z . Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol. 1994; 39(3):306-14. DOI: 10.1007/BF00160154. View