» Articles » PMID: 26873929

Gene Expression Inference with Deep Learning

Overview
Journal Bioinformatics
Specialty Biology
Date 2016 Feb 14
PMID 26873929
Citations 137
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost-effective strategy of profiling only ∼1000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression (LR), limiting its accuracy since it does not capture complex nonlinear relationship between expressions of genes.

Results: We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based Gene Expression Omnibus dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms LR with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than LR in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2921 expression profiles. Deep learning still outperforms LR with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes.

Availability And Implementation: D-GEX is available at https://github.com/uci-cbcl/D-GEX CONTACT: xhx@ics.uci.edu

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Multiomics Research: Principles and Challenges in Integrated Analysis.

Luo Y, Zhao C, Chen F Biodes Res. 2025; 6:0059.

PMID: 39990095 PMC: 11844812. DOI: 10.34133/bdr.0059.


Genome-wide association study on color-image-based convolutional neural networks.

Liu H, Liu Z, Li Z, Yu C, Hu P, Liu Q PeerJ. 2025; 13():e18822.

PMID: 39822975 PMC: 11737327. DOI: 10.7717/peerj.18822.


Understanding relationships between epigenetic marks and their application to robust assignment of chromatin states.

Murgas L, Pollastri G, Riquelme E, Saez M, Martin A Brief Bioinform. 2024; 26(1).

PMID: 39658206 PMC: 11631260. DOI: 10.1093/bib/bbae638.


mRNA vaccine sequence and structure design and optimization: Advances and challenges.

Jin L, Zhou Y, Zhang S, Chen S J Biol Chem. 2024; 301(1):108015.

PMID: 39608721 PMC: 11728972. DOI: 10.1016/j.jbc.2024.108015.


Compare three deep learning-based artificial intelligence models for classification of calcified lumbar disc herniation: a multicenter diagnostic study.

Liu Z, Zhang H, Zhang M, Qu C, Li L, Sun Y Front Surg. 2024; 11:1458569.

PMID: 39569028 PMC: 11576459. DOI: 10.3389/fsurg.2024.1458569.


References
1.
Peck D, Crawford E, Ross K, Stegmaier K, Golub T, Lamb J . A method for high-throughput gene expression signature analysis. Genome Biol. 2006; 7(7):R61. PMC: 1779561. DOI: 10.1186/gb-2006-7-7-r61. View

2.
Wang C, Gong B, Bushel P, Thierry-Mieg J, Thierry-Mieg D, Xu J . The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat Biotechnol. 2014; 32(9):926-32. PMC: 4243706. DOI: 10.1038/nbt.3001. View

3.
Lappalainen T, Sammeth M, Friedlander M, t Hoen P, Monlong J, Rivas M . Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013; 501(7468):506-11. PMC: 3918453. DOI: 10.1038/nature12531. View

4.
Bansal M, Belcastro V, Ambesi-Impiombato A, di Bernardo D . How to infer gene networks from expression profiles. Mol Syst Biol. 2007; 3:78. PMC: 1828749. DOI: 10.1038/msb4100120. View

5.
Quang D, Chen Y, Xie X . DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics. 2014; 31(5):761-3. PMC: 4341060. DOI: 10.1093/bioinformatics/btu703. View