» Articles » PMID: 37593361

Spectral Top-down Recovery of Latent Tree Models

Overview
Journal Inf inference
Date 2023 Aug 18
PMID 37593361
Authors
Affiliations
Soon will be listed here.
Abstract

Modeling the distribution of high-dimensional data by a latent tree graphical model is a prevalent approach in multiple scientific domains. A common task is to infer the underlying tree structure, given only observations of its terminal nodes. Many algorithms for tree recovery are computationally intensive, which limits their applicability to trees of moderate size. For large trees, a common approach, termed , is to recover the tree structure in two steps. First, separately recover the structure of multiple, possibly random subsets of the terminal nodes. Second, merge the resulting subtrees to form a full tree. Here, we develop spectral top-down recovery (STDR), a deterministic divide-and-conquer approach to infer large latent tree models. Unlike previous methods, STDR partitions the terminal nodes in a non random way, based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes. We prove that under certain conditions, this partitioning is consistent with the tree structure. This, in turn, leads to a significantly simpler merging procedure of the small subtrees. We prove that STDR is statistically consistent and bound the number of samples required to accurately recover the tree with high probability. Using simulated data from several common tree models in phylogenetics, we demonstrate that STDR has a significant advantage in terms of runtime, with improved or similar accuracy.

References
1.
Tamura K, Nei M, Kumar S . Prospects for inferring very large phylogenies by using the neighbor-joining method. Proc Natl Acad Sci U S A. 2004; 101(30):11030-5. PMC: 491989. DOI: 10.1073/pnas.0404206101. View

2.
Keller I, Bensasson D, Nichols R . Transition-transversion bias is not universal: a counter example from grasshopper pseudogenes. PLoS Genet. 2007; 3(2):e22. PMC: 1790724. DOI: 10.1371/journal.pgen.0030022. View

3.
Hasegawa M, Kishino H, Yano T . Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. 1985; 22(2):160-74. DOI: 10.1007/BF02101694. View

4.
Molloy E, Warnow T . Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge. Algorithms Mol Biol. 2019; 14:14. PMC: 6642500. DOI: 10.1186/s13015-019-0151-x. View

5.
Zhang S, Zhou S, He J, Lai J . Phylogeny inference based on spectral graph clustering. J Comput Biol. 2011; 18(4):627-37. DOI: 10.1089/cmb.2009.0028. View