» Articles » PMID: 30690464

Beyond the SNP Threshold: Identifying Outbreak Clusters Using Inferred Transmissions

Overview
Journal Mol Biol Evol
Specialty Biology
Date 2019 Jan 29
PMID 30690464
Citations 74
Authors
Affiliations
Soon will be listed here.
Abstract

Whole-genome sequencing (WGS) is increasingly used to aid the understanding of pathogen transmission. A first step in analyzing WGS data is usually to define "transmission clusters," sets of cases that are potentially linked by direct transmission. This is often done by including two cases in the same cluster if they are separated by fewer single-nucleotide polymorphisms (SNPs) than a specified threshold. However, there is little agreement as to what an appropriate threshold should be. We propose a probabilistic alternative, suggesting that the key inferential target for transmission clusters is the number of transmissions separating cases. We characterize this by combining the number of SNP differences and the length of time over which those differences have accumulated, using information about case timing, molecular clock, and transmission processes. Our framework has the advantage of allowing for variable mutation rates across the genome and can incorporate other epidemiological data. We use two tuberculosis studies to illustrate the impact of our approach: with British Columbia data by using spatial divisions; with Republic of Moldova data by incorporating antibiotic resistance. Simulation results indicate that our transmission-based method is better in identifying direct transmissions than a SNP threshold, with dissimilarity between clusterings of on average 0.27 bits compared with 0.37 bits for the SNP-threshold method and 0.84 bits for randomly permuted data. These results show that it is likely to outperform the SNP-threshold method where clock rates are variable and sample collection times are spread out. We implement the method in the R package transcluster.

Citing Articles

Comprehensive genomic surveillance reveals transmission profiles of extensively drug-resistant tuberculosis cases in Pará, Brazil.

Marcon D, Sharma A, Souza A, Barros R, Andrade V, Guimaraes R Front Microbiol. 2025; 15:1514862.

PMID: 39911713 PMC: 11794272. DOI: 10.3389/fmicb.2024.1514862.


Genotyped cluster investigations versus standard contact tracing: comparative impact on latent tuberculosis infection cascade of care in a low-incidence region.

Asare-Baah M, Seraphin M, Salmon-Trejo L, Johnston L, Dominique L, Ashkin D BMC Infect Dis. 2025; 25(1):74.

PMID: 39819477 PMC: 11740335. DOI: 10.1186/s12879-024-10358-4.


Exploring SNP filtering strategies: the influence of strict vs soft core.

Taouk M, Featherstone L, Taiaroa G, Seemann T, Ingle D, Stinear T Microb Genom. 2025; 11(1.

PMID: 39812553 PMC: 11734701. DOI: 10.1099/mgen.0.001346.


Early prediction of Mycobacterium tuberculosis transmission clusters using supervised learning models.

Gharamaleki O, Colijn C, Sekirov I, Johnston J, Sobkowiak B Sci Rep. 2024; 14(1):27652.

PMID: 39532933 PMC: 11557942. DOI: 10.1038/s41598-024-78247-z.


Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data.

Hall M, Wick R, Judd L, Nguyen A, Steinig E, Xie O Elife. 2024; 13.

PMID: 39388235 PMC: 11466455. DOI: 10.7554/eLife.98300.


References
1.
Colangeli R, Arcus V, Cursons R, Ruthe A, Karalus N, Coley K . Whole genome sequencing of Mycobacterium tuberculosis reveals slow growth and low mutation rates during latent infections in humans. PLoS One. 2014; 9(3):e91024. PMC: 3949705. DOI: 10.1371/journal.pone.0091024. View

2.
Campbell F, Strang C, Ferguson N, Cori A, Jombart T . When are pathogen genome sequences informative of transmission events?. PLoS Pathog. 2018; 14(2):e1006885. PMC: 5821398. DOI: 10.1371/journal.ppat.1006885. View

3.
Bryant J, Schurch A, van Deutekom H, Harris S, de Beer J, de Jager V . Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data. BMC Infect Dis. 2013; 13:110. PMC: 3599118. DOI: 10.1186/1471-2334-13-110. View

4.
Fine P . The interval between successive cases of an infectious disease. Am J Epidemiol. 2003; 158(11):1039-47. DOI: 10.1093/aje/kwg251. View

5.
Ford C, Lin P, Chase M, Shah R, Iartchouk O, Galagan J . Use of whole genome sequencing to estimate the mutation rate of Mycobacterium tuberculosis during latent infection. Nat Genet. 2011; 43(5):482-6. PMC: 3101871. DOI: 10.1038/ng.811. View