Deep Neural Networks with Controlled Variable Selection for the Identification of Putative Causal Genetic Variants

Overview

Journal Nat Mach Intell

Publisher Springer Nature

Specialty Biomedical Engineering

Date 2023 Oct 20

PMID 37859729

Authors

Peyman H Kassani

Fred Lu

Yann Le Guen

Michael E Belloy

Zihuai He

Affiliations

Soon will be listed here.

Abstract

Deep neural networks (DNNs) have been successfully utilized in many scientific problems for their high prediction accuracy, but their application to genetic studies remains challenging due to their poor interpretability. Here we consider the problem of scalable, robust variable selection in DNNs for the identification of putative causal genetic variants in genome sequencing studies. We identified a pronounced randomness in feature selection in DNNs due to its stochastic nature, which may hinder interpretability and give rise to misleading results. We propose an interpretable neural network model, stabilized using ensembling, with controlled variable selection for genetic studies. The merit of the proposed method includes: flexible modelling of the nonlinear effect of genetic variants to improve statistical power; multiple knockoffs in the input layer to rigorously control the false discovery rate; hierarchical layers to substantially reduce the number of weight parameters and activations, and improve computational efficiency; and stabilized feature selection to reduce the randomness in identified signals. We evaluate the proposed method in extensive simulation studies and apply it to the analysis of Alzheimer's disease genetics. We show that the proposed method, when compared with conventional linear and nonlinear methods, can lead to substantially more discoveries.

Citing Articles

A quantitative benchmark of neural network feature selection methods for detecting nonlinear signals.

Passemiers A, Folco P, Raimondi D, Birolo G, Moreau Y, Fariselli P Sci Rep. 2024; 14(1):31180.

PMID: 39732866 PMC: 11682240. DOI: 10.1038/s41598-024-82583-5.

Designing interpretable deep learning applications for functional genomics: a quantitative analysis.

van Hilten A, Katz S, Saccenti E, Niessen W, Roshchupkin G Brief Bioinform. 2024; 25(5).

PMID: 39293804 PMC: 11410376. DOI: 10.1093/bib/bbae449.

Phenotype prediction using biologically interpretable neural networks on multi-cohort multi-omics data.

van Hilten A, van Rooij J, Ikram M, Niessen W, van Meurs J, Roshchupkin G NPJ Syst Biol Appl. 2024; 10(1):81.

PMID: 39095438 PMC: 11297229. DOI: 10.1038/s41540-024-00405-w.

Artificial intelligence for nailfold capillaroscopy analyses - a proof of concept application in juvenile dermatomyositis.

Kassani P, Ehwerhemuepha L, Martin-King C, Kassab R, Gibbs E, Morgan G Pediatr Res. 2023; 95(4):981-987.

PMID: 37993641 DOI: 10.1038/s41390-023-02894-7.

References

Friedman J, Hastie T, Tibshirani R . Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010; 33(1):1-22. PMC: 2929880. View

Ma Y, Jun G, Zhang X, Chung J, Naj A, Chen Y . Analysis of Whole-Exome Sequencing Data for Alzheimer Disease Stratified by APOE Genotype. JAMA Neurol. 2019; 76(9):1099-1108. PMC: 6563544. DOI: 10.1001/jamaneurol.2019.1456. View

LeCun Y, Bengio Y, Hinton G . Deep learning. Nature. 2015; 521(7553):436-44. DOI: 10.1038/nature14539. View

Das S, Forer L, Schonherr S, Sidore C, Locke A, Kwong A . Next-generation genotype imputation service and methods. Nat Genet. 2016; 48(10):1284-1287. PMC: 5157836. DOI: 10.1038/ng.3656. View

Bennett D, Schneider J, Buchman A, Barnes L, Boyle P, Wilson R . Overview and findings from the rush Memory and Aging Project. Curr Alzheimer Res. 2012; 9(6):646-63. PMC: 3439198. DOI: 10.2174/156720512801322663. View

Costanzo M, VanderSluis B, Koch E, Baryshnikova A, Pons C, Tan G . A global genetic interaction network maps a wiring diagram of cellular function. Science. 2016; 353(6306). PMC: 5661885. DOI: 10.1126/science.aaf1420. View

Kuzmin E, VanderSluis B, Wang W, Tan G, Deshpande R, Chen Y . Systematic analysis of complex genetic interactions. Science. 2018; 360(6386). PMC: 6215713. DOI: 10.1126/science.aao1729. View

Schaffner S, Foo C, Gabriel S, Reich D, Daly M, Altshuler D . Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 2005; 15(11):1576-83. PMC: 1310645. DOI: 10.1101/gr.3709305. View

Auton A, Brooks L, Durbin R, Garrison E, Kang H, Korbel J . A global reference for human genetic variation. Nature. 2015; 526(7571):68-74. PMC: 4750478. DOI: 10.1038/nature15393. View

10.

He L, Loika Y, Park Y, Bennett D, Kellis M, Kulminski A . Exome-wide age-of-onset analysis reveals exonic variants in ERN1 and SPPL2C associated with Alzheimer's disease. Transl Psychiatry. 2021; 11(1):146. PMC: 7910483. DOI: 10.1038/s41398-021-01263-4. View

11.

Kunkle B, Schmidt M, Klein H, Naj A, Hamilton-Nelson K, Larson E . Novel Alzheimer Disease Risk Loci and Pathways in African American Individuals Using the African Genome Resources Panel: A Meta-analysis. JAMA Neurol. 2020; 78(1):102-113. PMC: 7573798. DOI: 10.1001/jamaneurol.2020.3536. View

12.

Roy D, Panda P, Roy K . Tree-CNN: A hierarchical Deep Convolutional Neural Network for incremental learning. Neural Netw. 2019; 121:148-160. DOI: 10.1016/j.neunet.2019.09.010. View

13.

Escott-Price V, Shoai M, Pither R, Williams J, Hardy J . Polygenic score prediction captures nearly all common genetic risk for Alzheimer's disease. Neurobiol Aging. 2016; 49:214.e7-214.e11. DOI: 10.1016/j.neurobiolaging.2016.07.018. View

14.

He Z, Liu L, Wang C, Guen Y, Lee J, Gogarten S . Identification of putative causal loci in whole-genome sequencing data via knockoff statistics. Nat Commun. 2021; 12(1):3152. PMC: 8149672. DOI: 10.1038/s41467-021-22889-4. View

15.

Visscher P, Wray N, Zhang Q, Sklar P, McCarthy M, Brown M . 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet. 2017; 101(1):5-22. PMC: 5501872. DOI: 10.1016/j.ajhg.2017.06.005. View

16.

Cordell H . Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet. 2009; 10(6):392-404. PMC: 2872761. DOI: 10.1038/nrg2579. View

17.

Chen C, Pollack S, Hunter D, Hirschhorn J, Kraft P, Price A . Improved ancestry inference using weights from external reference panels. Bioinformatics. 2013; 29(11):1399-406. PMC: 3661048. DOI: 10.1093/bioinformatics/btt144. View

18.

Moore J, Williams S . Epistasis and its implications for personal genetics. Am J Hum Genet. 2009; 85(3):309-20. PMC: 2771593. DOI: 10.1016/j.ajhg.2009.08.006. View

19.

Karczewski K, Francioli L, Tiao G, Cummings B, Alfoldi J, Wang Q . The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020; 581(7809):434-443. PMC: 7334197. DOI: 10.1038/s41586-020-2308-7. View

20.

Zuk O, Hechter E, Sunyaev S, Lander E . The mystery of missing heritability: Genetic interactions create phantom heritability. Proc Natl Acad Sci U S A. 2012; 109(4):1193-8. PMC: 3268279. DOI: 10.1073/pnas.1119675109. View