» Articles » PMID: 37878119

Representing and Extending Ensembles of Parsimonious Evolutionary Histories with a Directed Acyclic Graph

Overview
Journal J Math Biol
Date 2023 Oct 25
PMID 37878119
Authors
Affiliations
Soon will be listed here.
Abstract

In many situations, it would be useful to know not just the best phylogenetic tree for a given data set, but the collection of high-quality trees. This goal is typically addressed using Bayesian techniques, however, current Bayesian methods do not scale to large data sets. Furthermore, for large data sets with relatively low signal one cannot even store every good tree individually, especially when the trees are required to be bifurcating. In this paper, we develop a novel object called the "history subpartition directed acyclic graph" (or "history sDAG" for short) that compactly represents an ensemble of trees with labels (e.g. ancestral sequences) mapped onto the internal nodes. The history sDAG can be built efficiently and can also be efficiently trimmed to only represent maximally parsimonious trees. We show that the history sDAG allows us to find many additional equally parsimonious trees, extending combinatorially beyond the ensemble used to construct it. We argue that this object could be useful as the "skeleton" of a more complete uncertainty quantification.

Citing Articles

Finding high posterior density phylogenies by systematically extending a directed acyclic graph.

Jennings-Shaffer C, Rich D, Macaulay M, Karcher M, Ganapathy T, Kiami S Algorithms Mol Biol. 2025; 20(1):2.

PMID: 40022201 PMC: 11869616. DOI: 10.1186/s13015-025-00273-x.


Leveraging DAGs to improve context-sensitive and abundance-aware tree estimation.

Dumm W, Ralph D, DeWitt W, Vora A, Araki T, Victora G Philos Trans R Soc Lond B Biol Sci. 2025; 380(1919):20230315.

PMID: 39976415 PMC: 11867150. DOI: 10.1098/rstb.2023.0315.


Accurate Bayesian phylogenetic point estimation using a tree distribution parameterized by clade probabilities.

Berling L, Klawitter J, Bouckaert R, Xie D, Gavryushkin A, Drummond A PLoS Comput Biol. 2025; 21(2):e1012789.

PMID: 39937844 PMC: 11835378. DOI: 10.1371/journal.pcbi.1012789.


Finding high posterior density phylogenies by systematically extending a directed acyclic graph.

Jennings-Shaffer C, Rich D, Macaulay M, Karcher M, Ganapathy T, Kiami S ArXiv. 2024; .

PMID: 39606729 PMC: 11601806.

References
1.
Dewitt 3rd W, Mesin L, Victora G, Minin V, Matsen 4th F . Using Genotype Abundance to Improve Phylogenetic Inference. Mol Biol Evol. 2018; 35(5):1253-1265. PMC: 5913685. DOI: 10.1093/molbev/msy020. View

2.
Ye C, Thornlow B, Hinrichs A, Kramer A, Mirchandani C, Torvi D . matOptimize: a parallel tree optimization method enables online phylogenetics for SARS-CoV-2. Bioinformatics. 2022; 38(15):3734-3740. PMC: 9344837. DOI: 10.1093/bioinformatics/btac401. View

3.
Nicholls S, Poplawski R, Bull M, Underwood A, Chapman M, Abu-Dahab K . CLIMB-COVID: continuous integration supporting decentralised sequencing for SARS-CoV-2 genomic surveillance. Genome Biol. 2021; 22(1):196. PMC: 8247108. DOI: 10.1186/s13059-021-02395-y. View

4.
Goloboff P, Pol D . On divide-and-conquer strategies for parsimony analysis of large data sets: Rec-I-DCM3 versus TNT. Syst Biol. 2007; 56(3):485-95. DOI: 10.1080/10635150701431905. View

5.
Wertheim J, Steel M, Sanderson M . Accuracy in Near-Perfect Virus Phylogenies. Syst Biol. 2021; 71(2):426-438. PMC: 8385947. DOI: 10.1093/sysbio/syab069. View