» Articles » PMID: 33335719

On the Transformation of MinHash-based Uncorrected Distances into Proper Evolutionary Distances for Phylogenetic Inference

Overview
Journal F1000Res
Date 2020 Dec 18
PMID 33335719
Citations 11
Authors
Affiliations
Soon will be listed here.
Abstract

Recently developed MinHash-based techniques were proven successful in quickly estimating the level of similarity between large nucleotide sequences. This article discusses their usage and limitations in practice to approximating uncorrected distances between genomes, and transforming these pairwise dissimilarities into proper evolutionary distances. It is notably shown that complex distance measures can be easily approximated using simple transformation formulae based on few parameters. MinHash-based techniques can therefore be very useful for implementing fast yet accurate alignment-free phylogenetic reconstruction procedures from large sets of genomes. This last point of view is assessed with a simulation study using a dedicated bioinformatics tool.

Citing Articles

Multiple introductions of NRCS-A to the neonatal intensive care unit drive neonatal bloodstream infections: a case-control and environmental genomic survey.

Lees E, Gentry J, Webster H, Sanderson N, Eyre D, Wilson D Microb Genom. 2025; 11(1.

PMID: 39773387 PMC: 11706212. DOI: 10.1099/mgen.0.001340.


Description of Cohnella rhizoplanae sp. nov., isolated from the root surface of soybean (Glycine max).

Kampfer P, Glaeser S, McInroy J, Busse H, Clermont D, Criscuolo A Antonie Van Leeuwenhoek. 2024; 118(2):41.

PMID: 39718652 PMC: 11668882. DOI: 10.1007/s10482-024-02051-y.


Rathayibacter tanaceti sp. nov., a Novel Actinobacterium from Tanacetum vulgare Infested by Foliar Nematode Aphelenchoides sp.

Starodumova I, Dorofeeva L, Prisyazhnaya N, Tarlachkov S, Vasilenko O, Avtukh A Curr Microbiol. 2024; 81(5):123.

PMID: 38538917 DOI: 10.1007/s00284-024-03643-7.


Mottle: Accurate pairwise substitution distance at high divergence through the exploitation of short-read mappers and gradient descent.

Prusokiene A, Boonham N, Fox A, Howard T PLoS One. 2024; 19(3):e0298834.

PMID: 38512939 PMC: 10956839. DOI: 10.1371/journal.pone.0298834.


10.1.1, a Producer of Antimicrobial Agents.

Kudryakova I, Afoshin A, Tarlachkov S, Leontyevskaya E, Suzina N, Leontyevskaya Vasilyeva N Microorganisms. 2023; 11(12).

PMID: 38137997 PMC: 10745450. DOI: 10.3390/microorganisms11122853.


References
1.
Guindon S, Gascuel O . Efficient biased estimation of evolutionary distances when substitution rates vary across sites. Mol Biol Evol. 2002; 19(4):534-43. DOI: 10.1093/oxfordjournals.molbev.a004109. View

2.
Susko E, Inagaki Y, Roger A . On inconsistency of the neighbor-joining, least squares, and minimum evolution estimation when substitution processes are incorrectly modeled. Mol Biol Evol. 2004; 21(9):1629-42. DOI: 10.1093/molbev/msh159. View

3.
Takahashi K, Nei M . Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used. Mol Biol Evol. 2000; 17(8):1251-8. DOI: 10.1093/oxfordjournals.molbev.a026408. View

4.
Page A, Cummins C, Hunt M, Wong V, Reuter S, Holden M . Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics. 2015; 31(22):3691-3. PMC: 4817141. DOI: 10.1093/bioinformatics/btv421. View

5.
Klotzl F, Haubold B . Phylonium: fast estimation of evolutionary distances from large samples of similar genomes. Bioinformatics. 2019; 36(7):2040-2046. PMC: 7141870. DOI: 10.1093/bioinformatics/btz903. View