» Articles » PMID: 36160199

Selective Sweep Sites and SNP Dense Regions Differentiate Isolates Across Scales

Overview
Journal Front Microbiol
Specialty Microbiology
Date 2022 Sep 26
PMID 36160199
Authors
Affiliations
Soon will be listed here.
Abstract

, a bacterial zoonotic pathogen responsible for the economically and agriculturally important livestock disease bovine tuberculosis (bTB), infects a broad mammalian host range worldwide. This characteristic has led to bidirectional transmission events between livestock and wildlife species as well as the formation of wildlife reservoirs, impacting the success of bTB control measures. Next Generation Sequencing (NGS) has transformed our ability to understand disease transmission events by tracking variant sites, however the genomic signatures related to host adaptation following spillover, alongside the role of other genomic factors in the transmission process are understudied problems. We analyzed publicly available datasets collected from 700 hosts across three countries with bTB endemic regions (United Kingdom, United States, and New Zealand) to investigate if genomic regions with high SNP density and/or selective sweep sites play a role in adaptation to new environments (e.g., at the host-species, geographical, and/or sub-population levels). A simulated alignment was created to generate null distributions for defining genomic regions with high SNP counts and regions with selective sweeps evidence. Random Forest (RF) models were used to investigate evolutionary metrics within the genomic regions of interest to determine which genomic processes were the best for classifying across ecological scales. We identified in the bovis genomes 14 and 132 high SNP density and selective sweep regions, respectively. Selective sweep regions were ranked as the most important in classifying across the different scales in all RF models. SNP dense regions were found to have high importance in the badger and cattle specific RF models in classifying badger derived isolates from livestock derived ones. Additionally, the genes detected within these genomic regions harbor various pathogenic functions such as virulence and immunogenicity, membrane structure, host survival, and mycobactin production. The results of this study demonstrate how comparative genomics alongside machine learning approaches are useful to investigate further the nature of host-pathogen interactions.

Citing Articles

AliSim-HPC: parallel sequence simulator for phylogenetics.

Ly-Trong N, Barca G, Minh B Bioinformatics. 2023; 39(9).

PMID: 37656933 PMC: 10534053. DOI: 10.1093/bioinformatics/btad540.


The Many Hosts of Mycobacteria 9 (MHM9): A conference report.

Klever A, Alexander K, Almeida D, Anderson M, Ball R, Beamer G Tuberculosis (Edinb). 2023; 142:102377.

PMID: 37531864 PMC: 10529179. DOI: 10.1016/j.tube.2023.102377.

References
1.
Crispell J, Zadoks R, Harris S, Paterson B, Collins D, de-Lisle G . Using whole genome sequencing to investigate transmission in a multi-host system: bovine tuberculosis in New Zealand. BMC Genomics. 2017; 18(1):180. PMC: 5314462. DOI: 10.1186/s12864-017-3569-x. View

2.
Lees J, Galardini M, Bentley S, Weiser J, Corander J . pyseer: a comprehensive tool for microbial pangenome-wide association studies. Bioinformatics. 2018; 34(24):4310-4312. PMC: 6289128. DOI: 10.1093/bioinformatics/bty539. View

3.
Tajima F . Determination of window size for analyzing DNA sequences. J Mol Evol. 1991; 33(5):470-3. DOI: 10.1007/BF02103140. View

4.
de Arruda Rodrigues R, Araujo F, Davila A, Etges R, Parkhill J, van Tonder A . Genomic and temporal analyses of in southern Brazil. Microb Genom. 2021; 7(5). PMC: 8209730. DOI: 10.1099/mgen.0.000569. View

5.
Minh B, Schmidt H, Chernomor O, Schrempf D, Woodhams M, von Haeseler A . IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol Biol Evol. 2020; 37(5):1530-1534. PMC: 7182206. DOI: 10.1093/molbev/msaa015. View