» Articles » PMID: 33262743

Data-Driven Modeling for Species-Level Taxonomic Assignment From 16S RRNA: Application to Human Microbiomes

Overview
Journal Front Microbiol
Specialty Microbiology
Date 2020 Dec 2
PMID 33262743
Citations 5
Authors
Affiliations
Soon will be listed here.
Abstract

With the emergence of next-generation sequencing (NGS) technology, there have been a large number of metagenomic studies that estimated the bacterial composition via 16S ribosomal RNA (16S rRNA) amplicon sequencing. In particular, subsets of the hypervariable regions in 16S rRNA, such as V1-V2 and V3-V4, are targeted using high-throughput sequencing. The sequences from different taxa are assigned to a specific taxon based on the sequence homology. Since such sequences are highly homologous or identical between species in the same genus, it is challenging to determine the exact species using 16S rRNA sequences only. Therefore, in this study, were defined to obtain maximum resolution related with species using 16S rRNA. For the taxonomic assignment using 16S rRNA, three major 16S rRNA databases are independently used since the lineage of certain bacteria is not consistent among these databases. On the basis of the NCBI taxonomy classification, we re-annotated inconsistent lineage information in three major 16S rRNA databases. For each species, we constructed a consensus sequence model for each hypervariable region and determined that consist of indistinguishable species in terms of sequence homology. Using a -nearest neighbor method and the species consensus sequence models, the species-level taxonomy was determined. If the species determined is a member of , the species group is assigned instead of a specific species. Notably, the results of the evaluation on our method using simulated and mock datasets showed a high correlation with the real bacterial composition. Furthermore, in the analysis of real microbiome samples, such as salivary and gut microbiome samples, our method successfully performed species-level profiling and identified differences in the bacterial composition between different phenotypic groups.

Citing Articles

Analyzing microbiome data with taxonomic misclassification using a zero-inflated Dirichlet-multinomial model.

Koslovsky M BMC Bioinformatics. 2025; 26(1):69.

PMID: 40016656 PMC: 11869466. DOI: 10.1186/s12859-025-06078-4.


Combining 16S Sequencing and qPCR Quantification Reveals Driven Bacterial Overgrowth in the Skin of Severe Atopic Dermatitis Patients.

De Tomassi A, Reiter A, Reiger M, Rauer L, Rohayem R, Study Group C Biomolecules. 2023; 13(7).

PMID: 37509067 PMC: 10377005. DOI: 10.3390/biom13071030.


Environmental DNA and visual encounter surveys for amphibian biomonitoring in aquatic environments of the Ecuadorian Amazon.

Quilumbaquin W, Carrera-Gonzalez A, Van der Heyden C, Ortega-Andrade H PeerJ. 2023; 11:e15455.

PMID: 37456876 PMC: 10348306. DOI: 10.7717/peerj.15455.


Probiotics and their Metabolites Reduce Oxidative Stress in Middle-Aged Mice.

Lin W, Lin J, Kuo Y, Chiang P, Ho H Curr Microbiol. 2022; 79(4):104.

PMID: 35157139 PMC: 8843923. DOI: 10.1007/s00284-022-02783-y.


Species-Level Resolution of Female Bladder Microbiota from 16S rRNA Amplicon Sequencing.

Hoffman C, Siddiqui N, Fields I, Gregory W, Simon H, Mooney M mSystems. 2021; 6(5):e0051821.

PMID: 34519534 PMC: 8547459. DOI: 10.1128/mSystems.00518-21.


References
1.
Fu L, Niu B, Zhu Z, Wu S, Li W . CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012; 28(23):3150-2. PMC: 3516142. DOI: 10.1093/bioinformatics/bts565. View

2.
Sessou P, Keisam S, Tuikhar N, Gagara M, Farougou S, Jeyaram K . High-Throughput Illumina MiSeq Amplicon Sequencing of Yeast Communities Associated With Indigenous Dairy Products From Republics of Benin and Niger. Front Microbiol. 2019; 10:594. PMC: 6456676. DOI: 10.3389/fmicb.2019.00594. View

3.
Callahan B, McMurdie P, Rosen M, Han A, Johnson A, Holmes S . DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods. 2016; 13(7):581-3. PMC: 4927377. DOI: 10.1038/nmeth.3869. View

4.
Khachatryan L, de Leeuw R, Kraakman M, Pappas N, Te Raa M, Mei H . Taxonomic classification and abundance estimation using 16S and WGS-A comparison using controlled reference samples. Forensic Sci Int Genet. 2020; 46:102257. DOI: 10.1016/j.fsigen.2020.102257. View

5.
Federhen S . The NCBI Taxonomy database. Nucleic Acids Res. 2011; 40(Database issue):D136-43. PMC: 3245000. DOI: 10.1093/nar/gkr1178. View