» Articles » PMID: 27153677

Assessing Statistical Significance in Multivariable Genome Wide Association Analysis

Overview
Journal Bioinformatics
Specialty Biology
Date 2016 May 7
PMID 27153677
Citations 11
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Although Genome Wide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data are often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS.

Results: We propose a procedure in which all the SNPs are analyzed in a multiple generalized linear model, and we show its use for extremely high-dimensional datasets. Our method yields P-values for assessing significance of single SNPs or groups of SNPs while controlling for all other SNPs and the family wise error rate (FWER). Thus, our method tests whether or not a SNP carries any additional information about the phenotype beyond that available by all the other SNPs. This rules out spurious correlations between phenotypes and SNPs that can arise from marginal methods because the 'spuriously correlated' SNP merely happens to be correlated with the 'truly causal' SNP. In addition, the method offers a data driven approach to identifying and refining groups of SNPs that jointly contain informative signals about the phenotype. We demonstrate the value of our method by applying it to the seven diseases analyzed by the Wellcome Trust Case Control Consortium (WTCCC). We show, in particular, that our method is also capable of finding significant SNPs that were not identified in the original WTCCC study, but were replicated in other independent studies.

Availability And Implementation: Reproducibility of our research is supported by the open-source Bioconductor package hierGWAS.

Contact: peter.buehlmann@stat.math.ethz.ch

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

A predominant role of genotypic variation in both expression of sperm competition genes and paternity success in .

Patlar B, Fulham L, Civetta A Proc Biol Sci. 2023; 290(2007):20231715.

PMID: 37727083 PMC: 10509582. DOI: 10.1098/rspb.2023.1715.


Potential application of elastic nets for shared polygenicity detection with adapted threshold selection.

John M, Lencz T Int J Biostat. 2022; 19(2):417-438.

PMID: 36327464 PMC: 10154439. DOI: 10.1515/ijb-2020-0108.


A Novel Multitasking Ant Colony Optimization Method for Detecting Multiorder SNP Interactions.

Tuo S, Li C, Liu F, Zhu Y, Chen T, Feng Z Interdiscip Sci. 2022; 14(4):814-832.

PMID: 35788965 DOI: 10.1007/s12539-022-00530-2.


False discovery rate control in genome-wide association studies with population structure.

Sesia M, Bates S, Candes E, Marchini J, Sabatti C Proc Natl Acad Sci U S A. 2021; 118(40).

PMID: 34580220 PMC: 8501795. DOI: 10.1073/pnas.2105841118.


Genome-wide association study and its applications in the non-model crop Sesamum indicum.

Berhe M, Dossa K, You J, Mboup P, Diallo I, Diouf D BMC Plant Biol. 2021; 21(1):283.

PMID: 34157965 PMC: 8218510. DOI: 10.1186/s12870-021-03046-x.


References
1.
Yang J, Zaitlen N, Goddard M, Visscher P, Price A . Advantages and pitfalls in the application of mixed-model association methods. Nat Genet. 2014; 46(2):100-6. PMC: 3989144. DOI: 10.1038/ng.2876. View

2.
Yang J, Benyamin B, McEvoy B, Gordon S, Henders A, Nyholt D . Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010; 42(7):565-9. PMC: 3232052. DOI: 10.1038/ng.608. View

3.
Friedman J, Hastie T, Tibshirani R . Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010; 33(1):1-22. PMC: 2929880. View

4.
Malo N, Libiger O, Schork N . Accommodating linkage disequilibrium in genetic-association analyses via ridge regression. Am J Hum Genet. 2008; 82(2):375-85. PMC: 2427310. DOI: 10.1016/j.ajhg.2007.10.012. View

5.
Zeggini E, Weedon M, Lindgren C, Frayling T, Elliott K, Lango H . Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science. 2007; 316(5829):1336-41. PMC: 3772310. DOI: 10.1126/science.1142364. View