Phylogenetic Analysis That Models Compositional Heterogeneity over the Tree

Overview

Journal Methods Mol Biol

Specialty Molecular Biology

Date 2022 Sep 9

PMID 36083446

Authors

Peter G Foster

Affiliations

Soon will be listed here.

Abstract

Molecular sequences in a phylogenetic analysis can differ in composition, and that shows that the process of evolution can change over time. However, models of evolution in common use are homogeneous over the tree, and if used in a phylogenetic analysis with compositionally tree-heterogeneous datasets these models can recover incorrect trees. The NDCH or Node-Discrete Compositional Heterogeneity model is able to model such data by accommodating differences in composition over the tree. Usage, problems, and limitations of this model are discussed, and a modification, the NDCH2 model, is described that can ameliorate some of these problems and limitations. Using these models can greatly increase the fit of the model to the data and can find better tree topologies. These models and various statistical tests are illustrated using a bacterial SSU rRNA dataset. These models are implemented in the software P4, and files for the analyses described here are made available.

References

Muto A, Osawa S . The guanine and cytosine content of genomic DNA and bacterial evolution. Proc Natl Acad Sci U S A. 1987; 84(1):166-9. PMC: 304163. DOI: 10.1073/pnas.84.1.166. View

Steel M, Lockhart P, Penny D . Confidence in evolutionary trees from biological sequence data. Nature. 1993; 364(6436):440-2. DOI: 10.1038/364440a0. View

Hasegawa M, Hashimoto T . Ribosomal RNA trees misleading?. Nature. 1993; 361(6407):23. DOI: 10.1038/361023b0. View

Lake J . Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc Natl Acad Sci U S A. 1994; 91(4):1455-9. PMC: 43178. DOI: 10.1073/pnas.91.4.1455. View

Lockhart P, Steel M, Hendy M, Penny D . Recovering evolutionary trees under a more realistic model of sequence evolution. Mol Biol Evol. 2009; 11(4):605-12. DOI: 10.1093/oxfordjournals.molbev.a040136. View

Foster P, Jermiin L, Hickey D . Nucleotide composition bias affects amino acid content in proteins coded by animal mitochondria. J Mol Evol. 1997; 44(3):282-8. DOI: 10.1007/pl00006145. View

Naser-Khdour S, Minh B, Zhang W, Stone E, Lanfear R . The Prevalence and Impact of Model Violations in Phylogenetic Analysis. Genome Biol Evol. 2019; 11(12):3341-3352. PMC: 6893154. DOI: 10.1093/gbe/evz193. View

Foster P, Hickey D . Compositional bias may affect both DNA-based and protein-based phylogenetic reconstructions. J Mol Evol. 1999; 48(3):284-90. DOI: 10.1007/pl00006471. View

Collins T, Fedrigo O, Naylor G . Choosing the best genes for the job: the case for stationary genes in genome-scale phylogenetics. Syst Biol. 2005; 54(3):493-500. DOI: 10.1080/10635150590947339. View

10.

Rodriguez-Ezpeleta N, Brinkmann H, Roure B, Lartillot N, Lang B, Philippe H . Detecting and overcoming systematic errors in genome-scale phylogenies. Syst Biol. 2007; 56(3):389-99. DOI: 10.1080/10635150701397643. View

11.

Hirt R, Logsdon Jr J, Healy B, Dorey M, Doolittle W, Embley T . Microsporidia are related to Fungi: evidence from the largest subunit of RNA polymerase II and other proteins. Proc Natl Acad Sci U S A. 1999; 96(2):580-5. PMC: 15179. DOI: 10.1073/pnas.96.2.580. View

12.

Yang Z, Roberts D . On the use of nucleic acid sequences to infer early branchings in the tree of life. Mol Biol Evol. 1995; 12(3):451-8. DOI: 10.1093/oxfordjournals.molbev.a040220. View

13.

Galtier N, Gouy M . Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Mol Biol Evol. 1998; 15(7):871-9. DOI: 10.1093/oxfordjournals.molbev.a025991. View

14.

Foster P . Modeling compositional heterogeneity. Syst Biol. 2004; 53(3):485-95. DOI: 10.1080/10635150490445779. View

15.

Gowri-Shankar V, Rattray M . A reversible jump method for Bayesian phylogenetic inference with a nonhomogeneous substitution model. Mol Biol Evol. 2007; 24(6):1286-99. DOI: 10.1093/molbev/msm046. View

16.

Blanquart S, Lartillot N . A site- and time-heterogeneous model of amino acid replacement. Mol Biol Evol. 2008; 25(5):842-58. DOI: 10.1093/molbev/msn018. View

17.

Heaps S, Nye T, Boys R, Williams T, Embley T . Bayesian modelling of compositional heterogeneity in molecular phylogenetics. Stat Appl Genet Mol Biol. 2014; 13(5):589-609. DOI: 10.1515/sagmb-2013-0077. View

18.

Jermiin L, Ho S, Ababneh F, Robinson J, Larkum A . The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. Syst Biol. 2004; 53(4):638-43. DOI: 10.1080/10635150490468648. View

19.

Felsenstein J . Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981; 17(6):368-76. DOI: 10.1007/BF01734359. View

20.

Gascuel O . BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol. 1997; 14(7):685-95. DOI: 10.1093/oxfordjournals.molbev.a025808. View