» Articles » PMID: 39896589

WASTER: Practical Phylogenomics from Low-coverage Short Reads

Overview
Journal bioRxiv
Date 2025 Feb 3
PMID 39896589
Authors
Affiliations
Soon will be listed here.
Abstract

The advent of affordable whole-genome sequencing has spurred numerous large-scale projects aimed at inferring the tree of life, yet achieving a complete species-level phylogeny remains a distant goal due to significant costs and computational demands. Traditional species tree inference methods, though effective, are hampered by the need for high-coverage sequencing, high-quality genomic alignments, and extensive computational resources. To address these challenges, this study introduces WASTER, a novel tool for inferring species trees directly from short-read sequences. WASTER employs a k-mer based approach for identifying variable sites, circumventing the need for genome assembly and alignment. Using simulations, we demonstrate that WASTER achieves accuracy comparable to that of traditional alignment-based methods, even for low sequencing depth, and has substantially higher accuracy than other alignment-free methods. We validate WASTER's efficacy on real data, where it accurately reconstructs phylogenies of eukaryotic species with as low depth as 1.5X. WASTER provides a fast and efficient solution for phylogeny estimation in cases where genome assembly and/or alignment may bias analyses or is challenging, for example due to low sequencing depth. It also provides a method for generating guide trees for tree-based alignment algorithms. WASTER's ability to accurately estimate trees from low-coverage sequencing data without relying on assembly and alignment will lead to substantially reduced sequencing and computational costs in phylogenomic projects.

References
1.
Chifman J, Kubatko L . Quartet inference from SNP data under the coalescent model. Bioinformatics. 2014; 30(23):3317-24. PMC: 4296144. DOI: 10.1093/bioinformatics/btu530. View

2.
He S, Li L, Lv L, Cai W, Dou Y, Li J . Mandarin fish (Sinipercidae) genomes provide insights into innate predatory feeding. Commun Biol. 2020; 3(1):361. PMC: 7347838. DOI: 10.1038/s42003-020-1094-y. View

3.
Balaban M, Sarmashghi S, Mirarab S . APPLES: Scalable Distance-Based Phylogenetic Placement with or without Alignments. Syst Biol. 2019; 69(3):566-578. PMC: 7164367. DOI: 10.1093/sysbio/syz063. View

4.
Stiller J, Feng S, Chowdhury A, Rivas-Gonzalez I, Duchene D, Fang Q . Complexity of avian evolution revealed by family-level genomes. Nature. 2024; 629(8013):851-860. PMC: 11111414. DOI: 10.1038/s41586-024-07323-1. View

5.
. A comparative genomics multitool for scientific discovery and conservation. Nature. 2020; 587(7833):240-245. PMC: 7759459. DOI: 10.1038/s41586-020-2876-6. View