» Articles » PMID: 16044245

Biases in Phylogenetic Estimation Can Be Caused by Random Sequence Segments

Overview
Journal J Mol Evol
Specialty Biochemistry
Date 2005 Jul 27
PMID 16044245
Citations 11
Authors
Affiliations
Soon will be listed here.
Abstract

We consider the effects of fully or partially random sequences on the estimation of four-taxon phylogenies. Fully or partially random sequences occur when whole subsets of sequences or some sites for subsets of sequences are independent of sequence data for the other taxa. Random sequences can be a consequence of misalignment or because sites evolve at very fast rates in some portions of a tree, a situation that occurs especially in analyses involving deep divergence times. One might reasonably speculate that random sites will only add noise to the estimation of a phylogeny. We show that in the case that a random sequence is added to a three-taxa alignment, it is more likely to be a neighbor of the sequence corresponding to the longest branch in the three-taxon tree. Surprisingly, when only about half of the sites show randomness, a long-branch-repels form of small sample bias occurs, and when a minority of sites show randomness this becomes a long-branch-attraction bias again. The most serious bias, one that does not vanish with increasing sequence length, occurs when more than one sequence is partially random. If there is a large amount of overlap in the random sites for two sequences, those two sequences will be attracted to each other; otherwise, they will repel each other. Random sequences or sites can, therefore, cause complicated biases in phylogenetic inference. We suggest performing analyses with and without potentially saturated sequences and/or misaligned sites, to check that these biases are not affecting the inferred branching pattern.

Citing Articles

Can quartet analyses combining maximum likelihood estimation and Hennigian logic overcome long branch attraction in phylogenomic sequence data?.

Kuck P, Wilkinson M, Gross C, Foster P, Wagele J PLoS One. 2017; 12(8):e0183393.

PMID: 28841676 PMC: 5571918. DOI: 10.1371/journal.pone.0183393.


The influence of molecular markers and methods on inferring the phylogenetic relationships between the representatives of the Arini (parrots, Psittaciformes), determined on the basis of their complete mitochondrial genomes.

Urantowka A, Kroczak A, Mackiewicz P BMC Evol Biol. 2017; 17(1):166.

PMID: 28705202 PMC: 5513162. DOI: 10.1186/s12862-017-1012-1.


Utility of characters evolving at diverse rates of evolution to resolve quartet trees with unequal branch lengths: analytical predictions of long-branch effects.

Su Z, Townsend J BMC Evol Biol. 2015; 15:86.

PMID: 25968460 PMC: 4429678. DOI: 10.1186/s12862-015-0364-7.


Circumstances in which parsimony but not compatibility will be provably misleading.

Scotland R, Steel M Syst Biol. 2015; 64(3):492-504.

PMID: 25634097 PMC: 4395848. DOI: 10.1093/sysbio/syv008.


A phylogeny-based benchmarking test for orthology inference reveals the limitations of function-based validation.

Trachana K, Forslund K, Larsson T, Powell S, Doerks T, von Mering C PLoS One. 2014; 9(11):e111122.

PMID: 25369365 PMC: 4219706. DOI: 10.1371/journal.pone.0111122.


References
1.
Van de Peer Y, Frickey T, Taylor J, Meyer A . Dealing with saturation at the amino acid level: a case study based on anciently duplicated zebrafish genes. Gene. 2002; 295(2):205-11. DOI: 10.1016/s0378-1119(02)00689-3. View

2.
Stiller J, Hall B . Long-branch attraction and the rDNA model of early eukaryotic evolution. Mol Biol Evol. 2000; 16(9):1270-9. DOI: 10.1093/oxfordjournals.molbev.a026217. View

3.
Foster P . Modeling compositional heterogeneity. Syst Biol. 2004; 53(3):485-95. DOI: 10.1080/10635150490445779. View

4.
Lopez P, Forterre P, Philippe H . The root of the tree of life in the light of the covarion model. J Mol Evol. 1999; 49(4):496-508. DOI: 10.1007/pl00006572. View

5.
Jones D, Taylor W, Thornton J . The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992; 8(3):275-82. DOI: 10.1093/bioinformatics/8.3.275. View