» Articles » PMID: 36307411

Strain Level Microbial Detection and Quantification with Applications to Single Cell Metagenomics

Overview
Journal Nat Commun
Specialty Biology
Date 2022 Oct 28
PMID 36307411
Authors
Affiliations
Soon will be listed here.
Abstract

Computational identification and quantification of distinct microbes from high throughput sequencing data is crucial for our understanding of human health. Existing methods either use accurate but computationally expensive alignment-based approaches or less accurate but computationally fast alignment-free approaches, which often fail to correctly assign reads to genomes. Here we introduce CAMMiQ, a combinatorial optimization framework to identify and quantify distinct genomes (specified by a database) in a metagenomic dataset. As a key methodological innovation, CAMMiQ uses substrings of variable length and those that appear in two genomes in the database, as opposed to the commonly used fixed-length, unique substrings. These substrings allow to accurately decouple mixtures of highly similar genomes resulting in higher accuracy than the leading alternatives, without requiring additional computational resources, as demonstrated on commonly used benchmarking datasets. Importantly, we show that CAMMiQ can distinguish closely related bacterial strains in simulated metagenomic and real single-cell metatranscriptomic data.

Citing Articles

MeStanG-Resource for High-Throughput Sequencing Standard Data Sets Generation for Bioinformatic Methods Evaluation and Validation.

Ramos Lopez D, Flores F, Espindola A Biology (Basel). 2025; 14(1.

PMID: 39857299 PMC: 11762867. DOI: 10.3390/biology14010069.


kMetaShot: a fast and reliable taxonomy classifier for metagenome-assembled genomes.

Defazio G, Tangaro M, Pesole G, Fosso B Brief Bioinform. 2025; 26(1).

PMID: 39749666 PMC: 11695915. DOI: 10.1093/bib/bbae680.


Beyond the Gut: The intratumoral microbiome's influence on tumorigenesis and treatment response.

Zhang H, Fu L, Leiliang X, Qu C, Wu W, Wen R Cancer Commun (Lond). 2024; 44(10):1130-1167.

PMID: 39087354 PMC: 11483591. DOI: 10.1002/cac2.12597.


Identification of intracellular bacteria from multiple single-cell RNA-seq platforms using CSI-Microbes.

Robinson W, Stone J, Schischlik F, Gasmi B, Kelly M, Seibert C Sci Adv. 2024; 10(27):eadj7402.

PMID: 38959321 PMC: 11221508. DOI: 10.1126/sciadv.adj7402.


Fast, parallel, and cache-friendly suffix array construction.

Khan J, Rubel T, Molloy E, Dhulipala L, Patro R Algorithms Mol Biol. 2024; 19(1):16.

PMID: 38679714 PMC: 11056320. DOI: 10.1186/s13015-024-00263-5.


References
1.
Pruitt K, Tatusova T, Maglott D . NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2006; 35(Database issue):D61-5. PMC: 1716718. DOI: 10.1093/nar/gkl842. View

2.
Tu Q, He Z, Zhou J . Strain/species identification in metagenomes using genome-specific markers. Nucleic Acids Res. 2014; 42(8):e67. PMC: 4005670. DOI: 10.1093/nar/gku138. View

3.
Kim D, Song L, Breitwieser F, Salzberg S . Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 2016; 26(12):1721-1729. PMC: 5131823. DOI: 10.1101/gr.210641.116. View

4.
Ounit R, Wanamaker S, Close T, Lonardi S . CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics. 2015; 16:236. PMC: 4428112. DOI: 10.1186/s12864-015-1419-2. View

5.
Balmer O, Tanner M . Prevalence and implications of multiple-strain infections. Lancet Infect Dis. 2011; 11(11):868-78. DOI: 10.1016/S1473-3099(11)70241-9. View