» Articles » PMID: 39522045

MIMt: a Curated 16S RRNA Reference Database with Less Redundancy and Higher Accuracy at Species-level Identification

Overview
Date 2024 Nov 10
PMID 39522045
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Accurate determination and quantification of the taxonomic composition of microbial communities, especially at the species level, is one of the major issues in metagenomics. This is primarily due to the limitations of commonly used 16S rRNA reference databases, which either contain a lot of redundancy or a high percentage of sequences with missing taxonomic information. This may lead to erroneous identifications and, thus, to inaccurate conclusions regarding the ecological role and importance of those microorganisms in the ecosystem.

Results: The current study presents MIMt, a new 16S rRNA database for archaea and bacteria's identification, encompassing 47 001 sequences, all precisely identified at species level. In addition, a MIMt2.0 version was created with only curated sequences from RefSeq Targeted loci with 32 086 sequences. MIMt aims to be updated twice a year to include all newly sequenced species. We evaluated MIMt against Greengenes, RDP, GTDB and SILVA in terms of sequence distribution and taxonomic assignments accuracy. Our results showed that MIMt contains less redundancy, and despite being 20 to 500 times smaller than existing databases, outperforms them in completeness and taxonomic accuracy, enabling more precise assignments at lower taxonomic ranks and thus, significantly improving species-level identification.

References
1.
Singer E, Andreopoulos B, Bowers R, Lee J, Deshpande S, Chiniquy J . Next generation sequencing data of a defined microbial mock community. Sci Data. 2016; 3:160081. PMC: 5037974. DOI: 10.1038/sdata.2016.81. View

2.
Vetrovsky T, Baldrian P . The variability of the 16S rRNA gene in bacterial genomes and its consequences for bacterial community analyses. PLoS One. 2013; 8(2):e57923. PMC: 3583900. DOI: 10.1371/journal.pone.0057923. View

3.
Schoch C, Ciufo S, Domrachev M, Hotton C, Kannan S, Khovanskaya R . NCBI Taxonomy: a comprehensive update on curation, resources and tools. Database (Oxford). 2020; 2020. PMC: 7408187. DOI: 10.1093/database/baaa062. View

4.
McDonald D, Price M, Goodrich J, Nawrocki E, DeSantis T, Probst A . An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 2011; 6(3):610-8. PMC: 3280142. DOI: 10.1038/ismej.2011.139. View

5.
Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P . The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2012; 41(Database issue):D590-6. PMC: 3531112. DOI: 10.1093/nar/gks1219. View