Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics
Overview
Affiliations
Natural selection leaves a spatial pattern along the genome, with a haplotype distribution distortion near the selected locus that fades with distance. Evaluating the spatial signal of a population-genetic summary statistic across the genome allows for patterns of natural selection to be distinguished from neutrality. Considering the genomic spatial distribution of multiple summary statistics is expected to aid in uncovering subtle signatures of selection. In recent years, numerous methods have been devised that consider genomic spatial distributions across summary statistics, utilizing both classical machine learning and deep learning architectures. However, better predictions may be attainable by improving the way in which features are extracted from these summary statistics. We apply wavelet transform, multitaper spectral analysis, and S-transform to summary statistic arrays to achieve this goal. Each analysis method converts one-dimensional summary statistic arrays to two-dimensional images of spectral analysis, allowing simultaneous temporal and spectral assessment. We feed these images into convolutional neural networks and consider combining models using ensemble stacking. Our modeling framework achieves high accuracy and power across a diverse set of evolutionary settings, including population size changes and test sets of varying sweep strength, softness, and timing. A scan of central European whole-genome sequences recapitulated well-established sweep candidates and predicted novel cancer-associated genes as sweeps with high support. Given that this modeling framework is also robust to missing genomic segments, we believe that it will represent a welcome addition to the population-genomic toolkit for learning about adaptive processes from genomic data.
Sweeps in space: leveraging geographic data to identify beneficial alleles in .
Rehmann C, Small S, Ralph P, Kern A bioRxiv. 2025; .
PMID: 39975147 PMC: 11839090. DOI: 10.1101/2025.02.07.637123.
Carvajal-Rodriguez A Biol Methods Protoc. 2024; 9(1):bpae089.
PMID: 39679303 PMC: 11646571. DOI: 10.1093/biomethods/bpae089.
Digital Image Processing to Detect Adaptive Evolution.
Amin M, Hasan M, DeGiorgio M Mol Biol Evol. 2024; 41(12).
PMID: 39565932 PMC: 11631197. DOI: 10.1093/molbev/msae242.
Tree Sequences as a General-Purpose Tool for Population Genetic Inference.
Whitehouse L, Ray D, Schrider D Mol Biol Evol. 2024; 41(11).
PMID: 39460991 PMC: 11600592. DOI: 10.1093/molbev/msae223.
Tree sequences as a general-purpose tool for population genetic inference.
Whitehouse L, Ray D, Schrider D bioRxiv. 2024; .
PMID: 39185244 PMC: 11343121. DOI: 10.1101/2024.02.20.581288.