» Articles » PMID: 39951525

Interpretable AI for Inference of Causal Molecular Relationships from Omics Data

Overview
Journal Sci Adv
Specialties Biology
Science
Date 2025 Feb 14
PMID 39951525
Authors
Affiliations
Soon will be listed here.
Abstract

The discovery of molecular relationships from high-dimensional data is a major open problem in bioinformatics. Machine learning and feature attribution models have shown great promise in this context but lack causal interpretation. Here, we show that a popular feature attribution model, under certain assumptions, estimates an average of a causal quantity reflecting the direct influence of one variable on another. We leverage this insight to propose a precise definition of a gene regulatory relationship and implement a new tool, CIMLA (Counterfactual Inference by Machine Learning and Attribution Models), to identify differences in gene regulatory networks between biological conditions, a problem that has received great attention in recent years. Using extensive benchmarking on simulated data, we show that CIMLA is more robust to confounding variables and is more accurate than leading methods. Last, we use CIMLA to analyze a previously published single-cell RNA sequencing dataset from subjects with and without Alzheimer's disease (AD), discovering several potential regulators of AD.

References
1.
Marbach D, Prill R, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G . Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci U S A. 2010; 107(14):6286-91. PMC: 2851985. DOI: 10.1073/pnas.0913357107. View

2.
Wang C, Gao F, Giannakis G, DUrso G, Cai X . Efficient proximal gradient algorithm for inference of differential gene networks. BMC Bioinformatics. 2019; 20(1):224. PMC: 6498668. DOI: 10.1186/s12859-019-2749-x. View

3.
Amar D, Safer H, Shamir R . Dissection of regulatory networks that are altered in disease via differential co-expression. PLoS Comput Biol. 2013; 9(3):e1002955. PMC: 3591264. DOI: 10.1371/journal.pcbi.1002955. View

4.
Dibaeinia P, Sinha S . SERGIO: A Single-Cell Expression Simulator Guided by Gene Regulatory Networks. Cell Syst. 2020; 11(3):252-271.e11. PMC: 7530147. DOI: 10.1016/j.cels.2020.08.003. View

5.
Lai Y, Wu B, Chen L, Zhao H . A statistical method for identifying differential gene-gene co-expression patterns. Bioinformatics. 2004; 20(17):3146-55. DOI: 10.1093/bioinformatics/bth379. View