» Articles » PMID: 28239248

Efficient Computation of the Joint Sample Frequency Spectra for Multiple Populations

Overview
Date 2017 Feb 28
PMID 28239248
Citations 42
Authors
Affiliations
Soon will be listed here.
Abstract

A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called (MOran Models for Inference). Through an empirical study we demonstrate our improvements to numerical stability and computational complexity.

Citing Articles

A General Framework for Branch Length Estimation in Ancestral Recombination Graphs.

Deng Y, Song Y, Nielsen R bioRxiv. 2025; .

PMID: 39990503 PMC: 11844452. DOI: 10.1101/2025.02.14.638385.


Leveraging graphical model techniques to study evolution on phylogenetic networks.

Teo B, Bastide P, Ane C Philos Trans R Soc Lond B Biol Sci. 2025; 380(1919):20230310.

PMID: 39976402 PMC: 11867149. DOI: 10.1098/rstb.2023.0310.


Exact Decoding of a Sequentially Markov Coalescent Model in Genetics.

Ki C, Terhorst J J Am Stat Assoc. 2024; 119(547):2242-2255.

PMID: 39323740 PMC: 11421421. DOI: 10.1080/01621459.2023.2252570.


Conditional frequency spectra as a tool for studying selection on complex traits in biobanks.

Patel R, Weiss C, Zhu H, Mostafavi H, Simons Y, Spence J bioRxiv. 2024; .

PMID: 38948697 PMC: 11212903. DOI: 10.1101/2024.06.15.599126.


Computationally Efficient Demographic History Inference from Allele Frequencies with Supervised Machine Learning.

Tran L, Sun C, Struck T, Sajan M, Gutenkunst R Mol Biol Evol. 2024; 41(5).

PMID: 38636507 PMC: 11082913. DOI: 10.1093/molbev/msae077.


References
1.
Wakeley J, Hey J . Estimating ancestral population parameters. Genetics. 1997; 145(3):847-55. PMC: 1207868. DOI: 10.1093/genetics/145.3.847. View

2.
Lukic S, Hey J . Demographic inference using spectral methods on SNP data, with an analysis of the human out-of-Africa expansion. Genetics. 2012; 192(2):619-39. PMC: 3454885. DOI: 10.1534/genetics.112.141846. View

3.
Bryant D, Bouckaert R, Felsenstein J, Rosenberg N, RoyChoudhury A . Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol Biol Evol. 2012; 29(8):1917-32. PMC: 3408069. DOI: 10.1093/molbev/mss086. View

4.
Coventry A, Bull-Otterson L, Liu X, Clark A, Maxwell T, Crosby J . Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nat Commun. 2010; 1:131. PMC: 3060603. DOI: 10.1038/ncomms1130. View

5.
Bhaskar A, Kamm J, Song Y . APPROXIMATE SAMPLING FORMULAS FOR GENERAL FINITE-ALLELES MODELS OF MUTATION. Adv Appl Probab. 2014; 44(2):408-428. PMC: 3953561. DOI: 10.1239/aap/1339878718. View