» Articles » PMID: 30155371

Fine-scale Differentiation Between and Group Signatures in Metagenome Shotgun Data

Overview
Journal PeerJ
Date 2018 Aug 30
PMID 30155371
Citations 3
Authors
Affiliations
Soon will be listed here.
Abstract

Background: It is possible to detect bacterial species in shotgun metagenome datasets through the presence of only a few sequence reads. However, false positive results can arise, as was the case in the initial findings of a recent New York City subway metagenome project. False positives are especially likely when two closely related are present in the same sample. , the etiologic agent of anthrax, is a high-consequence pathogen that shares >99% average nucleotide identity with group (BCerG) genomes. Our goal was to create an analysis tool that used k-mers to detect incorporating information about the coverage of BCerG in the metagenome sample.

Methods: Using public complete genome sequence datasets, we identified a set of 31-mer signatures that differentiated from other members of the group (BCerG), and another set which differentiated BCerG genomes (including ) from other strains. We also created a set of 31-mers for detecting the lethal factor gene, the key genetic diagnostic of the presence of anthrax-causing bacteria. We created synthetic sequence datasets based on existing genomes to test the accuracy of a k-mer based detection model.

Results: We found 239,503 -specific 31-mers (the ), 10,183 BCerG 31-mers (the ), and 2,617 lethal factor k-mers (the set). We showed that false positive k-mers-which arise from random sequencing errors-are observable at high genome coverages of . We also showed that there is a "gray zone" below 0.184× coverage of the genome sequence, in which we cannot expect with high probability to identify lethal factor k-mers. We created a linear regression model to differentiate the presence of -like chromosomes from sequencing errors given the BCerG background coverage. We showed that while shotgun datasets from the New York City subway metagenome project had no matches to k-mers and hence were negative for , some samples showed evidence of strains very closely related to the pathogen.

Discussion: This work shows how extensive libraries of complete genomes can be used to create organism-specific signatures to help interpret metagenomes. We contrast "specialist" approaches to metagenome analysis such as this work to "generalist" software that seeks to classify all organisms present in the sample and note the more general utility of a k-mer filter approach when taxonomic boundaries lack clarity or high levels of precision are required.

Citing Articles

MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis.

Furstenau T, Schneider T, Shaffer I, Vazquez A, Sahl J, Fofanov V PeerJ. 2022; 10:e14292.

PMID: 36389404 PMC: 9651046. DOI: 10.7717/peerj.14292.


On the transformation of MinHash-based uncorrected distances into proper evolutionary distances for phylogenetic inference.

Criscuolo A F1000Res. 2020; 9:1309.

PMID: 33335719 PMC: 7713896. DOI: 10.12688/f1000research.26930.1.


Unique -mers as Strain-Specific Barcodes for Phylogenetic Analysis and Natural Microbiome Profiling.

Panyukov V, Kiselev S, Ozoline O Int J Mol Sci. 2020; 21(3).

PMID: 32023871 PMC: 7037511. DOI: 10.3390/ijms21030944.

References
1.
Rasko D, Rosovitz M, Okstad O, Fouts D, Jiang L, Cer R . Complete sequence analysis of novel plasmids from emetic and periodontal Bacillus cereus isolates reveals a common evolutionary history among the B. cereus-group plasmids, including Bacillus anthracis pXO1. J Bacteriol. 2006; 189(1):52-64. PMC: 1797222. DOI: 10.1128/JB.01313-06. View

2.
Koslicki D, Falush D . MetaPalette: a -mer Painting Approach for Metagenomic Taxonomic Profiling and Quantification of Novel Strain Variation. mSystems. 2016; 1(3). PMC: 5069763. DOI: 10.1128/mSystems.00020-16. View

3.
Helgason E, Okstad O, Caugant D, Johansen H, Fouet A, Mock M . Bacillus anthracis, Bacillus cereus, and Bacillus thuringiensis--one species on the basis of genetic evidence. Appl Environ Microbiol. 2000; 66(6):2627-30. PMC: 110590. DOI: 10.1128/AEM.66.6.2627-2630.2000. View

4.
Huang W, Li L, Myers J, Marth G . ART: a next-generation sequencing read simulator. Bioinformatics. 2011; 28(4):593-4. PMC: 3278762. DOI: 10.1093/bioinformatics/btr708. View

5.
Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P . The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2012; 41(Database issue):D590-6. PMC: 3531112. DOI: 10.1093/nar/gks1219. View