» Articles » PMID: 39233780

Short Branch Attraction in Phylogenomic Inference Under the Multispecies Coalescent

Overview
Journal Front Ecol Evol
Date 2024 Sep 5
PMID 39233780
Authors
Affiliations
Soon will be listed here.
Abstract

Accurate reconstruction of species trees often relies on the quality of input gene trees estimated from molecular sequences. Previous studies suggested that if the sequence length is fixed, the maximum likelihood may produce biased gene trees which subsequently mislead inference of species trees. Two key questions need to be answered in this context: what are the scenarios that may result in consistently biased gene trees? and for those scenarios, are there any remedies that may remove or at least reduce the misleading effects of consistently biased gene trees? In this article, we establish a theoretical framework to address these questions. Considering a scenario where the true gene tree is a 4-taxon star tree with two short branches leading to the species and , we demonstrate that maximum likelihood significantly favors the wrong bifurcating tree grouping the two species and with short branches. We name this inconsistent behavior short branch attraction, which may occur in real-world data involving a 4-taxon bifurcating gene tree with a short internal branch. If no mutation occurs along the internal branch, which is likely if the internal branch is short, the 4-taxon bifurcating tree is equivalent to the 4-taxon star tree and thus will suffer the same misleading effect of short branch attraction. Theoretical and simulation results further demonstrate that short branch attraction may occur in gene trees and species trees of arbitrary size. Moreover, short branch attraction is primarily caused by a lack of phylogenetic information in sequence data, suggesting that converting short internal branches to polytomies in the estimated gene trees can significantly reduce artifacts induced by short branch attraction.

Citing Articles

Internal Morphology and Phylogenetic Position of (Pancrustacea: Rhizocephala), an Enigmatic Parasitic Barnacle.

Miroliubov A, Lianguzova A, Krupenko D, Poliushkevich L, Novokreshchennykh S, Arbuzova N Biology (Basel). 2025; 13(12.

PMID: 39765635 PMC: 11673852. DOI: 10.3390/biology13120968.

References
1.
Townsend J, Su Z, Tekle Y . Phylogenetic signal and noise: predicting the power of a data set to resolve phylogeny. Syst Biol. 2012; 61(5):835-49. DOI: 10.1093/sysbio/sys036. View

2.
Felsenstein J . Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981; 17(6):368-76. DOI: 10.1007/BF01734359. View

3.
Xi Z, Liu L, Davis C . Genes with minimal phylogenetic information are problematic for coalescent analyses when gene tree estimation is biased. Mol Phylogenet Evol. 2015; 92:63-71. DOI: 10.1016/j.ympev.2015.06.009. View

4.
Rogers J . On the consistency of maximum likelihood estimation of phylogenetic trees from nucleotide sequences. Syst Biol. 1997; 46(2):354-7. DOI: 10.1093/sysbio/46.2.354. View

5.
Carvajal-Rodriguez A, Crandall K, Posada D . Recombination estimation under complex evolutionary models with the coalescent composite-likelihood method. Mol Biol Evol. 2006; 23(4):817-27. PMC: 1949848. DOI: 10.1093/molbev/msj102. View