Permutation Testing in the Presence of Polygenic Variation
Overview
Public Health
Authors
Affiliations
This article discusses problems with and solutions to performing valid permutation tests for quantitative trait loci in the presence of polygenic effects. Although permutation testing is a popular approach for determining statistical significance of a test statistic with an unknown distribution--for instance, the maximum of multiple correlated statistics or some omnibus test statistic for a gene, gene-set, or pathway--naive application of permutations may result in an invalid test. The risk of performing an invalid permutation test is particularly acute in complex trait mapping where polygenicity may combine with a structured population resulting from the presence of families, cryptic relatedness, admixture, or population stratification. I give both analytical derivations and a conceptual understanding of why typical permutation procedures fail and suggest an alternative permutation-based algorithm, MVNpermute, that succeeds. In particular, I examine the case where a linear mixed model is used to analyze a quantitative trait and show that both phenotype and genotype permutations may result in an invalid permutation test. I provide a formula that predicts the amount of inflation of the type 1 error rate depending on the degree of misspecification of the covariance structure of the polygenic effect and the heritability of the trait. I validate this formula by doing simulations, showing that the permutation distribution matches the theoretical expectation, and that my suggested permutation-based test obtains the correct null distribution. Finally, I discuss situations where naive permutations of the phenotype or genotype are valid and the applicability of the results to other test statistics.
FlexLMM: a Nextflow linear mixed model framework for GWAS.
Pierotti S, Fitzgerald T, Birney E Bioinformatics. 2025; 41(1).
PMID: 39814073 PMC: 11783306. DOI: 10.1093/bioinformatics/btaf021.
Mbatchou J, McPeek M Am J Hum Genet. 2024; 111(8):1750-1769.
PMID: 39025064 PMC: 11339629. DOI: 10.1016/j.ajhg.2024.06.010.
BulkLMM: Real-time genome scans for multiple quantitative traits using linear mixed models.
Yu Z, Farage G, Williams R, Broman K, Sen S bioRxiv. 2024; .
PMID: 38187625 PMC: 10769382. DOI: 10.1101/2023.12.20.572698.
Mbatchou J, McPeek M bioRxiv. 2024; .
PMID: 38187553 PMC: 10769254. DOI: 10.1101/2023.12.18.571948.
BRASS: Permutation methods for binary traits in genetic association studies with structured samples.
Mbatchou J, Abney M, McPeek M PLoS Genet. 2023; 19(11):e1011020.
PMID: 37934792 PMC: 10656004. DOI: 10.1371/journal.pgen.1011020.