» Articles » PMID: 30065887

Imputing Missing Distances in Molecular Phylogenetics

Overview
Journal PeerJ
Date 2018 Aug 2
PMID 30065887
Citations 3
Authors
Affiliations
Soon will be listed here.
Abstract

Missing data are frequently encountered in molecular phylogenetics, but there has been no accurate distance imputation method available for distance-based phylogenetic reconstruction. The general framework for distance imputation is to explore tree space and distance values to find an optimal combination of output tree and imputed distances. Here I develop a least-square method coupled with multivariate optimization to impute multiple missing distance in a distance matrix or from a set of aligned sequences with missing genes so that some sequences share no homologous sites (whose distances therefore need to be imputed). I show that phylogenetic trees can be inferred from distance matrices with about 10% of distances missing, and the accuracy of the resulting phylogenetic tree is almost as good as the tree from full information. The new method has the advantage over a recently published one in that it does not assume a molecular clock and is more accurate (comparable to maximum likelihood method based on simulated sequences). I have implemented the function in DAMBE software, which is freely available at http://dambe.bio.uottawa.ca.

Citing Articles

PhyloMissForest: a random forest framework to construct phylogenetic trees with missing data.

Pinheiro D, Santander-Jimenez S, Ilic A BMC Genomics. 2022; 23(1):377.

PMID: 35585494 PMC: 9116704. DOI: 10.1186/s12864-022-08540-6.


Updating the bionomy and geographical distribution of Anopheles (Nyssorhynchus) albitarsis F: A vector of malaria parasites in northern South America.

Zuniga M, Rubio-Palis Y, Brochero H PLoS One. 2021; 16(6):e0253230.

PMID: 34138918 PMC: 8211218. DOI: 10.1371/journal.pone.0253230.


Machine learning based imputation techniques for estimating phylogenetic trees from incomplete distance matrices.

Bhattacharjee A, Bayzid M BMC Genomics. 2020; 21(1):497.

PMID: 32689946 PMC: 7370488. DOI: 10.1186/s12864-020-06892-5.

References
1.
Xia X . DAMBE6: New Tools for Microbial Genomics, Phylogenetics, and Molecular Evolution. J Hered. 2017; 108(4):431-437. PMC: 5434544. DOI: 10.1093/jhered/esx033. View

2.
Xu Z, Hao B . CVTree update: a newly designed phylogenetic study platform using composition vectors and whole genomes. Nucleic Acids Res. 2009; 37(Web Server issue):W174-8. PMC: 2703908. DOI: 10.1093/nar/gkp278. View

3.
Lin G, Cai Z, Lin G, Chakraborty S, Xu D . ComPhy: prokaryotic composite distance phylogenies inferred from whole-genome gene sets. BMC Bioinformatics. 2009; 10 Suppl 1:S5. PMC: 2648732. DOI: 10.1186/1471-2105-10-S1-S5. View

4.
Criscuolo A, Berry V, Douzery E, Gascuel O . SDM: a fast distance-based approach for (super) tree building in phylogenomics. Syst Biol. 2006; 55(5):740-55. DOI: 10.1080/10635150600969872. View

5.
Saitou N, Nei M . The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987; 4(4):406-25. DOI: 10.1093/oxfordjournals.molbev.a040454. View