» Articles » PMID: 20945829

FastANOVA: an Efficient Algorithm for Genome-Wide Association Study

Overview
Journal KDD
Date 2010 Oct 16
PMID 20945829
Citations 17
Authors
Affiliations
Soon will be listed here.
Abstract

Studying the association between quantitative phenotype (such as height or weight) and single nucleotide polymorphisms (SNPs) is an important problem in biology. To understand underlying mechanisms of complex phenotypes, it is often necessary to consider joint genetic effects across multiple SNPs. ANOVA (analysis of variance) test is routinely used in association study. Important findings from studying gene-gene (SNP-pair) interactions are appearing in the literature. However, the number of SNPs can be up to millions. Evaluating joint effects of SNPs is a challenging task even for SNP-pairs. Moreover, with large number of SNPs correlated, permutation procedure is preferred over simple Bonferroni correction for properly controlling family-wise error rate and retaining mapping power, which dramatically increases the computational cost of association study.In this paper, we study the problem of finding SNP-pairs that have significant associations with a given quantitative phenotype. We propose an efficient algorithm, FastANOVA, for performing ANOVA tests on SNP-pairs in a batch mode, which also supports large permutation test. We derive an upper bound of SNP-pair ANOVA test, which can be expressed as the sum of two terms. The first term is based on single-SNP ANOVA test. The second term is based on the SNPs and independent of any phenotype permutation. Furthermore, SNP-pairs can be organized into groups, each of which shares a common upper bound. This allows for maximum reuse of intermediate computation, efficient upper bound estimation, and effective SNP-pair pruning. Consequently, FastANOVA only needs to perform the ANOVA test on a small number of candidate SNP-pairs without the risk of missing any significant ones. Extensive experiments demonstrate that FastANOVA is orders of magnitude faster than the brute-force implementation of ANOVA tests on all SNP pairs.

Citing Articles

EpiMOGA: An Epistasis Detection Method Based on a Multi-Objective Genetic Algorithm.

Chen Y, Xu F, Pian C, Xu M, Kong L, Fang J Genes (Basel). 2021; 12(2).

PMID: 33525573 PMC: 7911965. DOI: 10.3390/genes12020191.


Epi-GTBN: an approach of epistasis mining based on genetic Tabu algorithm and Bayesian network.

Guo Y, Zhong Z, Yang C, Hu J, Jiang Y, Liang Z BMC Bioinformatics. 2019; 20(1):444.

PMID: 31455207 PMC: 6712799. DOI: 10.1186/s12859-019-3022-z.


The early transcriptome response of cassava (Manihot esculenta Crantz) to mealybug (Phenacoccus manihoti) feeding.

Rauwane M, Odeny D, Millar I, Rey C, Rees J PLoS One. 2018; 13(8):e0202541.

PMID: 30133510 PMC: 6105004. DOI: 10.1371/journal.pone.0202541.


The search for gene-gene interactions in genome-wide association studies: challenges in abundance of methods, practical considerations, and biological interpretation.

Ritchie M, Van Steen K Ann Transl Med. 2018; 6(8):157.

PMID: 29862246 PMC: 5952010. DOI: 10.21037/atm.2018.04.05.


An Efficient Nonlinear Regression Approach for Genome-wide Detection of Marginal and Interacting Genetic Variations.

Lee S, Lozano A, Kambadur P, Xing E J Comput Biol. 2016; 23(5):372-89.

PMID: 27159633 PMC: 4876555. DOI: 10.1089/cmb.2015.0202.


References
1.
Carlborg O, Andersson L, Kinghorn B . The use of a genetic algorithm for simultaneous mapping of multiple interacting quantitative trait loci. Genetics. 2000; 155(4):2003-10. PMC: 1461191. DOI: 10.1093/genetics/155.4.2003. View

2.
Ritchie M, Hahn L, Roodi N, BAILEY L, Dupont W, Parl F . Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet. 2001; 69(1):138-47. PMC: 1226028. DOI: 10.1086/321276. View

3.
Shimomura K, King D, Steeves T, Whiteley A, Kushla J, Zemenides P . Genome-wide epistatic interaction analysis reveals complex genetic determinants of circadian behavior in mice. Genome Res. 2001; 11(6):959-80. DOI: 10.1101/gr.171601. View

4.
Halperin E, Kimmel G, Shamir R . Tag SNP selection in genotype data for maximizing SNP prediction accuracy. Bioinformatics. 2005; 21 Suppl 1:i195-203. DOI: 10.1093/bioinformatics/bti1021. View

5.
Roberts A, McMillan L, Wang W, Parker J, Rusyn I, Threadgill D . Inferring missing genotypes in large SNP panels using fast nearest-neighbor searches over sliding windows. Bioinformatics. 2007; 23(13):i401-7. DOI: 10.1093/bioinformatics/btm220. View