» Articles » PMID: 38625909

Learning Epistatic Polygenic Phenotypes with Boolean Interactions

Overview
Journal PLoS One
Date 2024 Apr 16
PMID 38625909
Authors
Affiliations
Soon will be listed here.
Abstract

Detecting epistatic drivers of human phenotypes is a considerable challenge. Traditional approaches use regression to sequentially test multiplicative interaction terms involving pairs of genetic variants. For higher-order interactions and genome-wide large-scale data, this strategy is computationally intractable. Moreover, multiplicative terms used in regression modeling may not capture the form of biological interactions. Building on the Predictability, Computability, Stability (PCS) framework, we introduce the epiTree pipeline to extract higher-order interactions from genomic data using tree-based models. The epiTree pipeline first selects a set of variants derived from tissue-specific estimates of gene expression. Next, it uses iterative random forests (iRF) to search training data for candidate Boolean interactions (pairwise and higher-order). We derive significance tests for interactions, based on a stabilized likelihood ratio test, by simulating Boolean tree-structured null (no epistasis) and alternative (epistasis) distributions on hold-out test data. Finally, our pipeline computes PCS epistasis p-values that probabilisticly quantify improvement in prediction accuracy via bootstrap sampling on the test set. We validate the epiTree pipeline in two case studies using data from the UK Biobank: predicting red hair and multiple sclerosis (MS). In the case of predicting red hair, epiTree recovers known epistatic interactions surrounding MC1R and novel interactions, representing non-linearities not captured by logistic regression models. In the case of predicting MS, a more complex phenotype than red hair, epiTree rankings prioritize novel interactions surrounding HLA-DRB1, a variant previously associated with MS in several populations. Taken together, these results highlight the potential for epiTree rankings to help reduce the design space for follow up experiments.

Citing Articles

Patterns of Fitness and Gene Expression Epistasis Generated by Beneficial Mutations in the rho and rpoB Genes of Escherichia coli during High-Temperature Adaptation.

Gonzalez-Gonzalez A, Batarseh T, Rodriguez-Verdugo A, Gaut B Mol Biol Evol. 2024; 41(9).

PMID: 39235107 PMC: 11414761. DOI: 10.1093/molbev/msae187.


A blood-based metabolomic signature predictive of risk for pancreatic cancer.

Irajizad E, Kenney A, Tang T, Vykoukal J, Wu R, Murage E Cell Rep Med. 2023; 4(9):101194.

PMID: 37729870 PMC: 10518621. DOI: 10.1016/j.xcrm.2023.101194.


Detecting gene-gene interactions from GWAS using diffusion kernel principal components.

Walakira A, Ocira J, Duroux D, Fouladi R, Moskon M, Rozman D BMC Bioinformatics. 2022; 23(1):57.

PMID: 35105309 PMC: 8805268. DOI: 10.1186/s12859-022-04580-7.

References
1.
Yu B, Kumbier K . Veridical data science. Proc Natl Acad Sci U S A. 2020; 117(8):3920-3929. PMC: 7049126. DOI: 10.1073/pnas.1901326117. View

2.
Cordell H . Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet. 2009; 10(6):392-404. PMC: 2872761. DOI: 10.1038/nrg2579. View

3.
Gregersen J, Kranc K, Ke X, Svendsen P, Madsen L, Thomsen A . Functional epistasis on a common MHC haplotype associated with multiple sclerosis. Nature. 2006; 443(7111):574-7. DOI: 10.1038/nature05133. View

4.
Catala-Senent J, Andreu Z, Hidalgo M, Soler-Saez I, Roig F, Yanguas-Casas N . A deep transcriptome meta-analysis reveals sex differences in multiple sclerosis. Neurobiol Dis. 2023; 181:106113. DOI: 10.1016/j.nbd.2023.106113. View

5.
Ban M, Elson J, Walton A, Turnbull D, Compston A, Chinnery P . Investigation of the role of mitochondrial DNA in multiple sclerosis susceptibility. PLoS One. 2008; 3(8):e2891. PMC: 2494944. DOI: 10.1371/journal.pone.0002891. View