» Articles » PMID: 24574292

Benchmarking of Methods for Genomic Taxonomy

Overview
Specialty Microbiology
Date 2014 Feb 28
PMID 24574292
Citations 193
Authors
Affiliations
Soon will be listed here.
Abstract

One of the first issues that emerges when a prokaryotic organism of interest is encountered is the question of what it is--that is, which species it is. The 16S rRNA gene formed the basis of the first method for sequence-based taxonomy and has had a tremendous impact on the field of microbiology. Nevertheless, the method has been found to have a number of shortcomings. In the current study, we trained and benchmarked five methods for whole-genome sequence-based prokaryotic species identification on a common data set of complete genomes: (i) SpeciesFinder, which is based on the complete 16S rRNA gene; (ii) Reads2Type that searches for species-specific 50-mers in either the 16S rRNA gene or the gyrB gene (for the Enterobacteraceae family); (iii) the ribosomal multilocus sequence typing (rMLST) method that samples up to 53 ribosomal genes; (iv) TaxonomyFinder, which is based on species-specific functional protein domain profiles; and finally (v) KmerFinder, which examines the number of cooccurring k-mers (substrings of k nucleotides in DNA sequence data). The performances of the methods were subsequently evaluated on three data sets of short sequence reads or draft genomes from public databases. In total, the evaluation sets constituted sequence data from more than 11,000 isolates covering 159 genera and 243 species. Our results indicate that methods that sample only chromosomal, core genes have difficulties in distinguishing closely related species which only recently diverged. The KmerFinder method had the overall highest accuracy and correctly identified from 93% to 97% of the isolates in the evaluations sets.

Citing Articles

Whole-Genome Shotgun Sequencing from Chicken Clinical Tracheal Samples for Bacterial and Novel Bacteriophage Identification.

Chrzastek K, Seal B, Kulkarni A, Kapczynski D Vet Sci. 2025; 12(2).

PMID: 40005922 PMC: 11861695. DOI: 10.3390/vetsci12020162.


Genomic diversity and evolutionary patterns of affecting farmed striped catfish () in Vietnam over 20 years.

Payne C, Phuong V, Phuoc N, Dung T, Phuoc L, Crumlish M Microb Genom. 2025; 11(2).

PMID: 39969283 PMC: 11840174. DOI: 10.1099/mgen.0.001368.


Oral colonization of antimicrobial-resistant bacteria in home health care participants and their association with oral and systemic status.

Nishihama S, Kawada-Matsuo M, Le M, Fujii A, Haruta A, Kajihara T Sci Rep. 2025; 15(1):5776.

PMID: 39962261 PMC: 11832749. DOI: 10.1038/s41598-025-90037-9.


Leclercia barmai sp. nov., isolated from worm castings of Eisenia fetida, is a urease-positive, 3-nitropropionic acid and glycerol-consuming bacterium.

Barman P, Sinha S, Chakraborty R Sci Rep. 2025; 15(1):5615.

PMID: 39955304 PMC: 11830034. DOI: 10.1038/s41598-024-78134-7.


in Common Dolphins () in Portugal-Characterization of First Isolates.

Cavaco S, Grilo M, Dias R, Nunes M, Pascoal P, Pereira M Animals (Basel). 2025; 15(3).

PMID: 39943144 PMC: 11816080. DOI: 10.3390/ani15030374.


References
1.
Maiden M, Bygraves J, Feil E, Morelli G, Russell J, Urwin R . Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci U S A. 1998; 95(6):3140-5. PMC: 19708. DOI: 10.1073/pnas.95.6.3140. View

2.
Grabherr M, Haas B, Yassour M, Levin J, Thompson D, Amit I . Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011; 29(7):644-52. PMC: 3571712. DOI: 10.1038/nbt.1883. View

3.
Bennett J, Jolley K, Earle S, Corton C, Bentley S, Parkhill J . A genomic approach to bacterial taxonomy: an examination and proposed reclassification of species within the genus Neisseria. Microbiology (Reading). 2012; 158(Pt 6):1570-1580. PMC: 3541776. DOI: 10.1099/mic.0.056077-0. View

4.
Almeida L, Araujo R . Highlights on molecular identification of closely related species. Infect Genet Evol. 2012; 13:67-75. DOI: 10.1016/j.meegid.2012.08.011. View

5.
Yang S, Doolittle R, Bourne P . Phylogeny determined by protein domain content. Proc Natl Acad Sci U S A. 2005; 102(2):373-8. PMC: 540256. DOI: 10.1073/pnas.0408810102. View