» Articles » PMID: 18596976

Population Substructure and Control Selection in Genome-wide Association Studies

Overview
Journal PLoS One
Date 2008 Jul 4
PMID 18596976
Citations 79
Authors
Affiliations
Soon will be listed here.
Abstract

Determination of the relevance of both demanding classical epidemiologic criteria for control selection and robust handling of population stratification (PS) represents a major challenge in the design and analysis of genome-wide association studies (GWAS). Empirical data from two GWAS in European Americans of the Cancer Genetic Markers of Susceptibility (CGEMS) project were used to evaluate the impact of PS in studies with different control selection strategies. In each of the two original case-control studies nested in corresponding prospective cohorts, a minor confounding effect due to PS (inflation factor lambda of 1.025 and 1.005) was observed. In contrast, when the control groups were exchanged to mimic a cost-effective but theoretically less desirable control selection strategy, the confounding effects were larger (lambda of 1.090 and 1.062). A panel of 12,898 autosomal SNPs common to both the Illumina and Affymetrix commercial platforms and with low local background linkage disequilibrium (pair-wise r(2)<0.004) was selected to infer population substructure with principal component analysis. A novel permutation procedure was developed for the correction of PS that identified a smaller set of principal components and achieved a better control of type I error (to lambda of 1.032 and 1.006, respectively) than currently used methods. The overlap between sets of SNPs in the bottom 5% of p-values based on the new test and the test without PS correction was about 80%, with the majority of discordant SNPs having both ranks close to the threshold. Thus, for the CGEMS GWAS of prostate and breast cancer conducted in European Americans, PS does not appear to be a major problem in well-designed studies. A study using suboptimal controls can have acceptable type I error when an effective strategy for the correction of PS is employed.

Citing Articles

Adjusting for principal components can induce spurious associations in genome-wide association studies in admixed populations.

Grinde K, Browning B, Reiner A, Thornton T, Browning S bioRxiv. 2024; .

PMID: 38617337 PMC: 11014513. DOI: 10.1101/2024.04.02.587682.


St. Jude Survivorship Portal: Sharing and Analyzing Large Clinical and Genomic Datasets from Pediatric Cancer Survivors.

Matt G, Sioson E, Shelton K, Wang J, Lu C, Zaldivar Peraza A Cancer Discov. 2024; 14(8):1403-1417.

PMID: 38593228 PMC: 11294819. DOI: 10.1158/2159-8290.CD-23-1441.


Population stratification correction using Bayesian shrinkage priors for genetic association studies.

Liu Z, Turkmen A, Lin S Ann Hum Genet. 2023; 87(6):302-315.

PMID: 37771252 PMC: 11624906. DOI: 10.1111/ahg.12527.


Genetically predicted telomere length is associated with clonal somatic copy number alterations in peripheral leukocytes.

Brown D, Lin S, Loh P, Chanock S, Savage S, Machiela M PLoS Genet. 2020; 16(10):e1009078.

PMID: 33090998 PMC: 7608979. DOI: 10.1371/journal.pgen.1009078.


Polygenic risk score for the prediction of breast cancer is related to lesser terminal duct lobular unit involution of the breast.

Bodelon C, Oh H, Derkach A, Sampson J, Sprague B, Vacek P NPJ Breast Cancer. 2020; 6:41.

PMID: 32964115 PMC: 7477555. DOI: 10.1038/s41523-020-00184-7.


References
1.
Wacholder S, Rothman N, Caporaso N . Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. J Natl Cancer Inst. 2000; 92(14):1151-8. DOI: 10.1093/jnci/92.14.1151. View

2.
Pfaff C, Barnholtz-Sloan J, Wagner J, Long J . Information on ancestry from genetic markers. Genet Epidemiol. 2004; 26(4):305-15. DOI: 10.1002/gepi.10319. View

3.
Thomas D, Witte J . Point: population stratification: a problem for case-control studies of candidate-gene associations?. Cancer Epidemiol Biomarkers Prev. 2002; 11(6):505-12. View

4.
Thomas G, Jacobs K, Yeager M, Kraft P, Wacholder S, Orr N . Multiple loci identified in a genome-wide association study of prostate cancer. Nat Genet. 2008; 40(3):310-5. DOI: 10.1038/ng.91. View

5.
Carlson C, Eberle M, Rieder M, Yi Q, Kruglyak L, Nickerson D . Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet. 2003; 74(1):106-20. PMC: 1181897. DOI: 10.1086/381000. View