ASTRAL-MP: Scaling ASTRAL to Very Large Datasets Using Randomization and Parallelization
Overview
Authors
Affiliations
Motivation: Evolutionary histories can change from one part of the genome to another. The potential for discordance between the gene trees has motivated the development of summary methods that reconstruct a species tree from an input collection of gene trees. ASTRAL is a widely used summary method and has been able to scale to relatively large datasets. However, the size of genomic datasets is quickly growing. Despite its relative efficiency, the current single-threaded implementation of ASTRAL is falling behind the data growth trends is not able to analyze the largest available datasets in a reasonable time.
Results: ASTRAL uses dynamic programing and is not trivially parallel. In this paper, we introduce ASTRAL-MP, the first version of ASTRAL that can exploit parallelism and also uses randomization techniques to speed up some of its steps. Importantly, ASTRAL-MP can take advantage of not just multiple CPU cores but also one or several graphics processing units (GPUs). The ASTRAL-MP code scales very well with increasing CPU cores, and its GPU version, implemented in OpenCL, can have up to 158× speedups compared to ASTRAL-III. Using GPUs and multiple cores, ASTRAL-MP is able to analyze datasets with 10 000 species or datasets with more than 100 000 genes in <2 days.
Availability And Implementation: ASTRAL-MP is available at https://github.com/smirarab/ASTRAL/tree/MP.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Kopania E, Thomas G, Hutter C, Mortimer S, Callahan C, Roycroft E Evolution. 2024; 79(1):11-27.
PMID: 39392918 PMC: 11663510. DOI: 10.1093/evolut/qpae146.
Ellsworth S, Rautsaw R, Ward M, Holding M, Rokyta D J Mol Evol. 2024; 92(4):505-524.
PMID: 39026042 DOI: 10.1007/s00239-024-10191-y.
Single-fly genome assemblies fill major phylogenomic gaps across the Drosophilidae Tree of Life.
Kim B, Gellert H, Church S, Suvorov A, Anderson S, Barmina O PLoS Biol. 2024; 22(7):e3002697.
PMID: 39024225 PMC: 11257246. DOI: 10.1371/journal.pbio.3002697.
Huang W, Xu B, Guo W, Huang Z, Li Y, Wu W Front Plant Sci. 2024; 15:1365686.
PMID: 38751846 PMC: 11094225. DOI: 10.3389/fpls.2024.1365686.
Myers E, Rautsaw R, Borja M, Jones J, Grunwald C, Holding M Syst Biol. 2024; 73(4):722-741.
PMID: 38695290 PMC: 11906154. DOI: 10.1093/sysbio/syae018.