DeepCOMBI: Explainable Artificial Intelligence for the Analysis and Discovery in Genome-wide Association Studies

Overview

Journal NAR Genom Bioinform

Publisher Oxford University Press

Specialty Biology

Date 2021 Jul 23

PMID 34296082

Citations 16

Authors

Bettina Mieth

Alexandre Rozier

Juan Antonio Rodriguez

Marina M C Hohne

Nico Gornitz

Klaus-Robert Muller

Affiliations

Soon will be listed here.

Abstract

Deep learning has revolutionized data science in many fields by greatly improving prediction performances in comparison to conventional approaches. Recently, explainable artificial intelligence has emerged as an area of research that goes beyond pure prediction improvement by extracting knowledge from deep learning methodologies through the interpretation of their results. We investigate such explanations to explore the genetic architectures of phenotypes in genome-wide association studies. Instead of testing each position in the genome individually, the novel three-step algorithm, called DeepCOMBI, first trains a neural network for the classification of subjects into their respective phenotypes. Second, it explains the classifiers' decisions by applying layer-wise relevance propagation as one example from the pool of explanation techniques. The resulting importance scores are eventually used to determine a subset of the most relevant locations for multiple hypothesis testing in the third step. The performance of DeepCOMBI in terms of power and precision is investigated on generated datasets and a 2007 study. Verification of the latter is achieved by validating all findings with independent studies published up until 2020. DeepCOMBI is shown to outperform ordinary raw -value thresholding and other baseline methods. Two novel disease associations (rs10889923 for hypertension, rs4769283 for type 1 diabetes) were identified.

Citing Articles

Leveraging hierarchical structures for genetic block interaction studies using the hierarchical transformer.

Li S, Arora S, Attaoua R, Hamet P, Tremblay J, Bihlo A medRxiv. 2024; .

PMID: 39606365 PMC: 11601704. DOI: 10.1101/2024.11.18.24317486.

AutoXAI4Omics: an automated explainable AI tool for omics and tabular data.

Strudwick J, Gardiner L, Denning-James K, Haiminen N, Evans A, Kelly J Brief Bioinform. 2024; 26(1).

PMID: 39576223 PMC: 11583442. DOI: 10.1093/bib/bbae593.

Epi-SSA: A novel epistasis detection method based on a multi-objective sparrow search algorithm.

Sun L, Bian J, Xin Y, Jiang L, Zheng L PLoS One. 2024; 19(10):e0311223.

PMID: 39446852 PMC: 11500897. DOI: 10.1371/journal.pone.0311223.

Designing interpretable deep learning applications for functional genomics: a quantitative analysis.

van Hilten A, Katz S, Saccenti E, Niessen W, Roshchupkin G Brief Bioinform. 2024; 25(5).

PMID: 39293804 PMC: 11410376. DOI: 10.1093/bib/bbae449.

Distributed transformer for high order epistasis detection in large-scale datasets.

Graca M, Nobre R, Sousa L, Ilic A Sci Rep. 2024; 14(1):14579.

PMID: 38918413 PMC: 11199512. DOI: 10.1038/s41598-024-65317-5.

References

Visscher P, Wray N, Zhang Q, Sklar P, McCarthy M, Brown M . 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet. 2017; 101(1):5-22. PMC: 5501872. DOI: 10.1016/j.ajhg.2017.06.005. View

Cordell H . Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet. 2009; 10(6):392-404. PMC: 2872761. DOI: 10.1038/nrg2579. View

Ambroise C, McLachlan G . Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci U S A. 2002; 99(10):6562-6. PMC: 124442. DOI: 10.1073/pnas.102102699. View

Wei Z, Wang W, Bradfield J, Li J, Cardinale C, Frackelton E . Large sample size, wide variant spectrum, and advanced machine-learning technique boost risk prediction for inflammatory bowel disease. Am J Hum Genet. 2013; 92(6):1008-12. PMC: 3675261. DOI: 10.1016/j.ajhg.2013.05.002. View

Loh P, Kichaev G, Gazal S, Schoech A, Price A . Mixed-model association for biobank-scale datasets. Nat Genet. 2018; 50(7):906-908. PMC: 6309610. DOI: 10.1038/s41588-018-0144-6. View

Chen G, Lee S, Montgomery G, Wray N, Visscher P, Gearry R . Performance of risk prediction for inflammatory bowel disease based on genotyping platform and genomic risk score method. BMC Med Genet. 2017; 18(1):94. PMC: 5576242. DOI: 10.1186/s12881-017-0451-2. View

Kooperberg C, LeBlanc M, Obenchain V . Risk prediction using genome-wide association studies. Genet Epidemiol. 2010; 34(7):643-52. PMC: 2964405. DOI: 10.1002/gepi.20509. View

Okser S, Pahikkala T, Airola A, Salakoski T, Ripatti S, Aittokallio T . Regularized machine learning in the genetic prediction of complex traits. PLoS Genet. 2014; 10(11):e1004754. PMC: 4230844. DOI: 10.1371/journal.pgen.1004754. View

Fisher C, Mehta P . Bayesian feature selection for high-dimensional linear regression via the Ising approximation with applications to genomics. Bioinformatics. 2015; 31(11):1754-61. DOI: 10.1093/bioinformatics/btv037. View

10.

Van Lishout F, Mahachie John J, Gusareva E, Urrea V, Cleynen I, Theatre E . An efficient algorithm to perform multiple testing in epistasis screening. BMC Bioinformatics. 2013; 14:138. PMC: 3648350. DOI: 10.1186/1471-2105-14-138. View

11.

Visscher P, Brown M, McCarthy M, Yang J . Five years of GWAS discovery. Am J Hum Genet. 2012; 90(1):7-24. PMC: 3257326. DOI: 10.1016/j.ajhg.2011.11.029. View

12.

Lee S, Wray N, Goddard M, Visscher P . Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet. 2011; 88(3):294-305. PMC: 3059431. DOI: 10.1016/j.ajhg.2011.02.002. View

13.

Ikegawa S . A short history of the genome-wide association study: where we were and where we are going. Genomics Inform. 2013; 10(4):220-5. PMC: 3543921. DOI: 10.5808/GI.2012.10.4.220. View

14.

Krawczuk J, Lukaszuk T . The feature selection bias problem in relation to high-dimensional gene data. Artif Intell Med. 2015; 66:63-71. DOI: 10.1016/j.artmed.2015.11.001. View

15.

Loh P, Tucker G, Bulik-Sullivan B, Vilhjalmsson B, Finucane H, Salem R . Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat Genet. 2015; 47(3):284-90. PMC: 4342297. DOI: 10.1038/ng.3190. View

16.

Lippert C, Listgarten J, Davidson R, Baxter S, Poon H, Poong H . An exhaustive epistatic SNP association analysis on expanded Wellcome Trust data. Sci Rep. 2013; 3:1099. PMC: 3551227. DOI: 10.1038/srep01099. View

17.

LeCun Y, Bengio Y, Hinton G . Deep learning. Nature. 2015; 521(7553):436-44. DOI: 10.1038/nature14539. View

18.

Waldmann P . Approximate Bayesian neural networks in genomic prediction. Genet Sel Evol. 2018; 50(1):70. PMC: 6303864. DOI: 10.1186/s12711-018-0439-1. View

19.

Ching T, Himmelstein D, Beaulieu-Jones B, Kalinin A, Do B, Way G . Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface. 2018; 15(141). PMC: 5938574. DOI: 10.1098/rsif.2017.0387. View

20.

Quang D, Xie X . DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 2016; 44(11):e107. PMC: 4914104. DOI: 10.1093/nar/gkw226. View