Probability of Detecting Disease-associated Single Nucleotide Polymorphisms in Case-control Genome-wide Association Studies

Overview

Journal Biostatistics

Publisher Oxford University Press

Specialty Public Health

Date 2007 Sep 18

PMID 17873152

Citations 19

Authors

Mitchell H Gail

Ruth M Pfeiffer

William Wheeler

David Pee

Affiliations

Soon will be listed here.

Abstract

Some case-control genome-wide association studies (CCGWASs) select promising single nucleotide polymorphisms (SNPs) by ranking corresponding p-values, rather than by applying the same p-value threshold to each SNP. For such a study, we define the detection probability (DP) for a specific disease-associated SNP as the probability that the SNP will be "T-selected," namely have one of the top T largest chi-square values (or smallest p-values) for trend tests of association. The corresponding proportion positive (PP) is the fraction of selected SNPs that are true disease-associated SNPs. We study DP and PP analytically and via simulations, both for fixed and for random effects models of genetic risk, that allow for heterogeneity in genetic risk. DP increases with genetic effect size and case-control sample size and decreases with the number of nondisease-associated SNPs, mainly through the ratio of T to N, the total number of SNPs. We show that DP increases very slowly with T, and the increment in DP per unit increase in T declines rapidly with T. DP is also diminished if the number of true disease SNPs exceeds T. For a genetic odds ratio per minor disease allele of 1.2 or less, even a CCGWAS with 1000 cases and 1000 controls requires T to be impractically large to achieve an acceptable DP, leading to PP values so low as to make the study futile and misleading. We further calculate the sample size of the initial CCGWAS that is required to minimize the total cost of a research program that also includes follow-up studies to examine the T-selected SNPs. A large initial CCGWAS is desirable if genetic effects are small or if the cost of a follow-up study is large.

Citing Articles

Robust Tests in Genome-Wide Scans under Incomplete Linkage Disequilibrium.

Zheng G, Joo J, Zaykin D, Wu C, Geller N Stat Sci. 2024; 24(4):503-516.

PMID: 39635042 PMC: 11616002. DOI: 10.1214/09-sts314.

Genome-wide association analysis of fleece traits in Northwest Xizang white cashmere goat.

Lu X, Suo L, Yan X, Li W, Su Y, Zhou B Front Vet Sci. 2024; 11:1409084.

PMID: 38872797 PMC: 11171727. DOI: 10.3389/fvets.2024.1409084.

Comparison of approaches for incorporating new information into existing risk prediction models.

Grill S, Ankerst D, Gail M, Chatterjee N, Pfeiffer R Stat Med. 2016; 36(7):1134-1156.

PMID: 27943382 PMC: 8182952. DOI: 10.1002/sim.7190.

Regionally Smoothed Meta-Analysis Methods for GWAS Datasets.

Begum F, Sharker M, Sherman S, Tseng G, Feingold E Genet Epidemiol. 2015; 40(2):154-60.

PMID: 26707090 PMC: 4724289. DOI: 10.1002/gepi.21949.

The ranking probability approach and its usage in design and analysis of large-scale studies.

Kuo C, Zaykin D PLoS One. 2013; 8(12):e83079.

PMID: 24376639 PMC: 3869737. DOI: 10.1371/journal.pone.0083079.