Greater Power and Computational Efficiency for Kernel-based Association Testing of Sets of Genetic Variants

Overview

Journal Bioinformatics

Publisher Oxford University Press

Specialty Biology

Date 2014 Jul 31

PMID 25075117

Citations 23

Authors

Christoph Lippert

Jing Xiang

Danilo Horta

Christian Widmer

Carl Kadie

David Heckerman

Jennifer Listgarten

Affiliations

Soon will be listed here.

Abstract

Motivation: Set-based variance component tests have been identified as a way to increase power in association studies by aggregating weak individual effects. However, the choice of test statistic has been largely ignored even though it may play an important role in obtaining optimal power. We compared a standard statistical test-a score test-with a recently developed likelihood ratio (LR) test. Further, when correction for hidden structure is needed, or gene-gene interactions are sought, state-of-the art algorithms for both the score and LR tests can be computationally impractical. Thus we develop new computationally efficient methods.

Results: After reviewing theoretical differences in performance between the score and LR tests, we find empirically on real data that the LR test generally has more power. In particular, on 15 of 17 real datasets, the LR test yielded at least as many associations as the score test-up to 23 more associations-whereas the score test yielded at most one more association than the LR test in the two remaining datasets. On synthetic data, we find that the LR test yielded up to 12% more associations, consistent with our results on real data, but also observe a regime of extremely small signal where the score test yielded up to 25% more associations than the LR test, consistent with theory. Finally, our computational speedups now enable (i) efficient LR testing when the background kernel is full rank, and (ii) efficient score testing when the background kernel changes with each test, as for gene-gene interaction tests. The latter yielded a factor of 2000 speedup on a cohort of size 13 500.

Availability: Software available at http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm/.

Contact: heckerma@microsoft.com

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Alternative splicing induces sample-level variation in gene-gene correlations.

Lu Y, Pierce B, Wang P, Yang F, Chen L BMC Genomics. 2024; 23(Suppl 4):867.

PMID: 39658796 PMC: 11633002. DOI: 10.1186/s12864-024-11118-z.

A scalable adaptive quadratic kernel method for interpretable epistasis analysis in complex traits.

Fu B, Anand P, Anand A, Mefford J, Sankararaman S Genome Res. 2024; 34(9):1294-1303.

PMID: 39209554 PMC: 11529862. DOI: 10.1101/gr.279140.124.

Population genomics of Agrotis segetum provide insights into the local adaptive evolution of agricultural pests.

Wang P, Jin M, Wu C, Peng Y, He Y, Wang H BMC Biol. 2024; 22(1):42.

PMID: 38378556 PMC: 10877822. DOI: 10.1186/s12915-024-01844-x.

Higher-order genetic interaction discovery with network-based biological priors.

Pellizzoni P, Muzio G, Borgwardt K Bioinformatics. 2023; 39(39 Suppl 1):i523-i533.

PMID: 37387173 PMC: 10311320. DOI: 10.1093/bioinformatics/btad273.

networkGWAS: a network-based approach to discover genetic associations.

Muzio G, OBray L, Meng-Papaxanthos L, Klatt J, Fischer K, Borgwardt K Bioinformatics. 2023; 39(6).

PMID: 37285313 PMC: 10281858. DOI: 10.1093/bioinformatics/btad370.

References

Price A, Kryukov G, de Bakker P, Purcell S, Staples J, Wei L . Pooled association tests for rare variants in exon-resequencing studies. Am J Hum Genet. 2010; 86(6):832-8. PMC: 3032073. DOI: 10.1016/j.ajhg.2010.04.005. View

Zawistowski M, Gopalakrishnan S, Ding J, Li Y, Grimm S, Zollner S . Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes. Am J Hum Genet. 2010; 87(5):604-17. PMC: 2978957. DOI: 10.1016/j.ajhg.2010.10.012. View

Ionita-Laza I, Buxbaum J, Laird N, Lange C . A new testing strategy to identify rare variants with either risk or protective effect on disease. PLoS Genet. 2011; 7(2):e1001289. PMC: 3033379. DOI: 10.1371/journal.pgen.1001289. View

Tatonetti N, Dudley J, Sagreiya H, Butte A, Altman R . An integrative method for scoring candidate genes from association studies: application to warfarin dosing. BMC Bioinformatics. 2010; 11 Suppl 9:S9. PMC: 2967750. DOI: 10.1186/1471-2105-11-S9-S9. View

Listgarten J, Lippert C, Kadie C, Davidson R, Eskin E, Heckerman D . Improved linear mixed models for genome-wide association studies. Nat Methods. 2012; 9(6):525-6. PMC: 3597090. DOI: 10.1038/nmeth.2037. View

. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007; 447(7145):661-78. PMC: 2719288. DOI: 10.1038/nature05911. View

Liu D, Leal S . A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet. 2010; 6(10):e1001156. PMC: 2954824. DOI: 10.1371/journal.pgen.1001156. View

Schwender H, Ruczinski I, Ickstadt K . Testing SNPs and sets of SNPs for importance in association studies. Biostatistics. 2010; 12(1):18-32. PMC: 3006123. DOI: 10.1093/biostatistics/kxq042. View

Wu M, Kraft P, Epstein M, Taylor D, Chanock S, Hunter D . Powerful SNP-set analysis for case-control genome-wide association studies. Am J Hum Genet. 2010; 86(6):929-42. PMC: 3032061. DOI: 10.1016/j.ajhg.2010.05.002. View

10.

Lee S, Emond M, Bamshad M, Barnes K, Rieder M, Nickerson D . Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am J Hum Genet. 2012; 91(2):224-37. PMC: 3415556. DOI: 10.1016/j.ajhg.2012.06.007. View

11.

Bhatia G, Bansal V, Harismendy O, Schork N, Topol E, Frazer K . A covering method for detecting genetic associations between rare variants and common phenotypes. PLoS Comput Biol. 2010; 6(10):e1000954. PMC: 2954823. DOI: 10.1371/journal.pcbi.1000954. View

12.

Ionita-Laza I, Lee S, Makarov V, Buxbaum J, Lin X . Sequence kernel association tests for the combined effect of rare and common variants. Am J Hum Genet. 2013; 92(6):841-53. PMC: 3675243. DOI: 10.1016/j.ajhg.2013.04.015. View

13.

Li B, Leal S . Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008; 83(3):311-21. PMC: 2842185. DOI: 10.1016/j.ajhg.2008.06.024. View

14.

Band G, Le Q, Jostins L, Pirinen M, Kivinen K, Jallow M . Imputation-based meta-analysis of severe malaria in three African populations. PLoS Genet. 2013; 9(5):e1003509. PMC: 3662650. DOI: 10.1371/journal.pgen.1003509. View

15.

Schifano E, Epstein M, Bielak L, Jhun M, Kardia S, Peyser P . SNP set association analysis for familial data. Genet Epidemiol. 2012; 36(8):797-810. PMC: 3683469. DOI: 10.1002/gepi.21676. View

16.

le Cessie S, van Houwelingen H . Testing the fit of a regression model via score tests in random effects models. Biometrics. 1995; 51(2):600-14. View

17.

Morgenthaler S, Thilly W . A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat Res. 2006; 615(1-2):28-56. DOI: 10.1016/j.mrfmmm.2006.09.003. View

18.

Lippert C, Listgarten J, Liu Y, Kadie C, Davidson R, Heckerman D . FaST linear mixed models for genome-wide association studies. Nat Methods. 2011; 8(10):833-5. DOI: 10.1038/nmeth.1681. View

19.

Liu D, Ghosh D, Lin X . Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC Bioinformatics. 2008; 9:292. PMC: 2483287. DOI: 10.1186/1471-2105-9-292. View

20.

Madsen B, Browning S . A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 2009; 5(2):e1000384. PMC: 2633048. DOI: 10.1371/journal.pgen.1000384. View