» Articles » PMID: 21873298

Statistics and Truth in Phylogenomics

Overview
Journal Mol Biol Evol
Specialty Biology
Date 2011 Aug 30
PMID 21873298
Citations 84
Authors
Affiliations
Soon will be listed here.
Abstract

Phylogenomics refers to the inference of historical relationships among species using genome-scale sequence data and to the use of phylogenetic analysis to infer protein function in multigene families. With rapidly decreasing sequencing costs, phylogenomics is becoming synonymous with evolutionary analysis of genome-scale and taxonomically densely sampled data sets. In phylogenetic inference applications, this translates into very large data sets that yield evolutionary and functional inferences with extremely small variances and high statistical confidence (P value). However, reports of highly significant P values are increasing even for contrasting phylogenetic hypotheses depending on the evolutionary model and inference method used, making it difficult to establish true relationships. We argue that the assessment of the robustness of results to biological factors, that may systematically mislead (bias) the outcomes of statistical estimation, will be a key to avoiding incorrect phylogenomic inferences. In fact, there is a need for increased emphasis on the magnitude of differences (effect sizes) in addition to the P values of the statistical test of the null hypothesis. On the other hand, the amount of sequence data available will likely always remain inadequate for some phylogenomic applications, for example, those involving episodic positive selection at individual codon positions and in specific lineages. Again, a focus on effect size and biological relevance, rather than the P value, may be warranted. Here, we present a theoretical overview and discuss practical aspects of the interplay between effect sizes, bias, and P values as it relates to the statistical inference of evolutionary truth in phylogenomics.

Citing Articles

MEGA12: Molecular Evolutionary Genetic Analysis version 12 for adaptive and green computing.

Kumar S, Stecher G, Suleski M, Sanderford M, Sharma S, Tamura K Mol Biol Evol. 2024; .

PMID: 39708372 PMC: 11683415. DOI: 10.1093/molbev/msae263.


The evolutionary history of the ancient weevil family Belidae (Coleoptera: Curculionoidea) reveals the marks of Gondwana breakup and major floristic turnovers, including the rise of angiosperms.

Li X, Marvaldi A, Oberprieler R, Clarke D, Farrell B, Sequeira A Elife. 2024; 13.

PMID: 39665616 PMC: 11637463. DOI: 10.7554/eLife.97552.


An Analysis of Combined Molecular Weight and Hydrophobicity Similarity between the Amino Acid Sequences of Spike Protein Receptor Binding Domains of Betacoronaviruses and Functionally Similar Sequences from Other Virus Families.

Dixson J, Vumma L, Azad R Microorganisms. 2024; 12(10).

PMID: 39458330 PMC: 11510113. DOI: 10.3390/microorganisms12102021.


The Meaning and Measure of Concordance Factors in Phylogenomics.

Lanfear R, Hahn M Mol Biol Evol. 2024; 41(11).

PMID: 39418118 PMC: 11532913. DOI: 10.1093/molbev/msae214.


Natural selection and recombination at host-interacting lipoprotein loci drive genome diversification of Lyme disease and related bacteria.

Akther S, Mongodin E, Morgan R, Di L, Yang X, Golovchenko M mBio. 2024; 15(9):e0174924.

PMID: 39145656 PMC: 11389397. DOI: 10.1128/mbio.01749-24.


References
1.
Zhou T, Gu W, Wilke C . Detecting positive and purifying selection at synonymous sites in yeast and worm. Mol Biol Evol. 2010; 27(8):1912-22. PMC: 2915641. DOI: 10.1093/molbev/msq077. View

2.
Ciccarelli F, Doerks T, von Mering C, Creevey C, Snel B, Bork P . Toward automatic reconstruction of a highly resolved tree of life. Science. 2006; 311(5765):1283-7. DOI: 10.1126/science.1123061. View

3.
Zardoya R, Suarez M . Sequencing and phylogenomic analysis of whole mitochondrial genomes of animals. Methods Mol Biol. 2008; 422:185-200. DOI: 10.1007/978-1-59745-581-7_12. View

4.
Miller W, Rosenbloom K, Hardison R, Hou M, Taylor J, Raney B . 28-way vertebrate alignment and conservation track in the UCSC Genome Browser. Genome Res. 2007; 17(12):1797-808. PMC: 2099589. DOI: 10.1101/gr.6761107. View

5.
Goldman N, Yang Z . A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994; 11(5):725-36. DOI: 10.1093/oxfordjournals.molbev.a040153. View