Reconciliation with Non-binary Species Trees
Overview
Molecular Biology
Affiliations
Reconciliation extracts information from the topological incongruence between gene and species trees to infer duplications and losses in the history of a gene family. The inferred duplication-loss histories provide valuable information for a broad range of biological applications, including ortholog identification, estimating gene duplication times, and rooting and correcting gene trees. While reconciliation for binary trees is a tractable and well studied problem, there are no algorithms for reconciliation with non-binary species trees. Yet a striking proportion of species trees are non-binary. For example, 64% of branch points in the NCBI taxonomy have three or more children. When applied to non-binary species trees, current algorithms overestimate the number of duplications because they cannot distinguish between duplication and incomplete lineage sorting. We present the first algorithms for reconciling binary gene trees with non-binary species trees under a duplication-loss parsimony model. Our algorithms utilize an efficient mapping from gene to species trees to infer the minimum number of duplications in O(|V(G) | x (k(S) + h(S))) time, where |V(G)| is the number of nodes in the gene tree, h(S) is the height of the species tree and k(S) is the size of its largest polytomy. We present a dynamic programming algorithm which also minimizes the total number of losses. Although this algorithm is exponential in the size of the largest polytomy, it performs well in practice for polytomies with outdegree of 12 or less. We also present a heuristic which estimates the minimal number of losses in polynomial time. In empirical tests, this algorithm finds an optimal loss history 99% of the time. Our algorithms have been implemented in NOTUNG, a robust, production quality, tree-fitting program, which provides a graphical user interface for exploratory analysis and also supports automated, high-throughput analysis of large data sets.
Glez-Pena D, Lopez-Fernandez H, Duque P, Vieira C, Vieira J J Integr Bioinform. 2024; 21(2).
PMID: 39054685 PMC: 11377030. DOI: 10.1515/jib-2023-0051.
The Theory of Gene Family Histories.
Hellmuth M, Stadler P Methods Mol Biol. 2024; 2802:1-32.
PMID: 38819554 DOI: 10.1007/978-1-0716-3838-5_1.
Bioinformatic Characterization and Molecular Evolution of the Hemoglobins.
Montes-Rodriguez I, Cadilla C, Lopez-Garriga J, Gonzalez-Mendez R Genes (Basel). 2022; 13(11).
PMID: 36360278 PMC: 9690805. DOI: 10.3390/genes13112041.
Expansion and Accelerated Evolution of 9-Exon Odorant Receptors in Polistes Paper Wasps.
Legan A, Jernigan C, Miller S, Fuchs M, Sheehan M Mol Biol Evol. 2021; 38(9):3832-3846.
PMID: 34151983 PMC: 8383895. DOI: 10.1093/molbev/msab023.
Structural variation and evolution of chloroplast in green algae.
Qi F, Zhao Y, Zhao N, Wang K, Li Z, Wang Y PeerJ. 2021; 9:e11524.
PMID: 34131524 PMC: 8176911. DOI: 10.7717/peerj.11524.