» Articles » PMID: 22960215

A Novel Approach for Choosing Summary Statistics in Approximate Bayesian Computation

Overview
Journal Genetics
Specialty Genetics
Date 2012 Sep 11
PMID 22960215
Citations 29
Authors
Affiliations
Soon will be listed here.
Abstract

The choice of summary statistics is a crucial step in approximate Bayesian computation (ABC). Since statistics are often not sufficient, this choice involves a trade-off between loss of information and reduction of dimensionality. The latter may increase the efficiency of ABC. Here, we propose an approach for choosing summary statistics based on boosting, a technique from the machine-learning literature. We consider different types of boosting and compare them to partial least-squares regression as an alternative. To mitigate the lack of sufficiency, we also propose an approach for choosing summary statistics locally, in the putative neighborhood of the true parameter value. We study a demographic model motivated by the reintroduction of Alpine ibex (Capra ibex) into the Swiss Alps. The parameters of interest are the mean and standard deviation across microsatellites of the scaled ancestral mutation rate (θ(anc) = 4N(e)u) and the proportion of males obtaining access to matings per breeding season (ω). By simulation, we assess the properties of the posterior distribution obtained with the various methods. According to our criteria, ABC with summary statistics chosen locally via boosting with the L(2)-loss performs best. Applying that method to the ibex data, we estimate θ(anc)≈ 1.288 and find that most of the variation across loci of the ancestral mutation rate u is between 7.7 × 10(-4) and 3.5 × 10(-3) per locus per generation. The proportion of males with access to matings is estimated as ω≈ 0.21, which is in good agreement with recent independent estimates.

Citing Articles

Population Genetics of Snails from New-Emerging Snail Habitats in a Currently Non-Endemic Area.

Cheng Y, Sun M, Wang N, Gao C, Peng H, Zhang J Trop Med Infect Dis. 2023; 8(1).

PMID: 36668949 PMC: 9861412. DOI: 10.3390/tropicalmed8010042.


ABCDP: Approximate Bayesian Computation with Differential Privacy.

Park M, Vinaroz M, Jitkrittum W Entropy (Basel). 2021; 23(8).

PMID: 34441101 PMC: 8391538. DOI: 10.3390/e23080961.


The genomic history of the Aegean palatial civilizations.

Clemente F, Unterlander M, Dolgova O, Amorim C, Coroado-Santos F, Neuenschwander S Cell. 2021; 184(10):2565-2586.e21.

PMID: 33930288 PMC: 8127963. DOI: 10.1016/j.cell.2021.03.039.


Waves Out of the Korean Peninsula and Inter- and Intra-Species Replacements in Freshwater Fishes in Japan.

Taniguchi S, Bertl J, Futschik A, Kishino H, Okazaki T Genes (Basel). 2021; 12(2).

PMID: 33669929 PMC: 7924830. DOI: 10.3390/genes12020303.


Likelihood-free inference via classification.

Gutmann M, Dutta R, Kaski S, Corander J Stat Comput. 2020; 28(2):411-425.

PMID: 31997856 PMC: 6956883. DOI: 10.1007/s11222-017-9738-6.


References
1.
Biebach I, Keller L . A strong genetic footprint of the re-introduction history of Alpine ibex (Capra ibex ibex). Mol Ecol. 2009; 18(24):5046-58. DOI: 10.1111/j.1365-294X.2009.04420.x. View

2.
Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M . Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface. 2009; 6(31):187-202. PMC: 2658655. DOI: 10.1098/rsif.2008.0172. View

3.
Fu Y, Li W . Estimating the age of the common ancestor of a sample of DNA sequences. Mol Biol Evol. 1997; 14(2):195-9. DOI: 10.1093/oxfordjournals.molbev.a025753. View

4.
Robert C, Cornuet J, Marin J, Pillai N . Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A. 2011; 108(37):15112-7. PMC: 3174657. DOI: 10.1073/pnas.1102900108. View

5.
Ohta T, Kimura M . A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet Res. 1973; 22(2):201-4. DOI: 10.1017/s0016672300012994. View