» Articles » PMID: 34014778

Deep Large-Scale Multitask Learning Network for Gene Expression Inference

Overview
Journal J Comput Biol
Date 2021 May 20
PMID 34014778
Citations 1
Authors
Affiliations
Soon will be listed here.
Abstract

Gene expression profiling makes it possible to conduct many biological studies in a variety of fields due to its thorough characterization of cellular states under various experimental conditions. Despite recent advances in high-throughput technology, profiling an entire set of genomes is still difficult and expensive. Due to the high correlation between expression patterns of different genes, the aforementioned problem can be solved with a cost-effective approach that collects only a small subset of genes, called landmark genes, representing the entire set of genes, and infer the remaining genes, called target genes, using a computational model. There are several shallow and deep regression models in literature to estimate the expressions of target genes from the landmark genes. However, the shallow mostly have limited capacity in learning the nonlinear and complex gene expression data and are prone to underfitting, and the deep models generally do not take advantage of correlation among target genes in the learning process and suffer from overfitting. Considering the gene expression inference as a multitask learning problem, we propose a new deep multitask learning algorithm to tackle these issues. Our learning framework automatically learns the correlation between target genes and uses this knowledge to improve its generalization. Specifically, we utilize a subnetwork with low-dimensional latent variables to discover the relationships between target genes and enforce a seamless and easy to implement regularization to our deep regression model. Unlike the existing multitask learning methods that can only deal with dozens or hundreds of tasks, our algorithm is able to efficiently learn the relationships between ∼10,000 target genes and, thus, is scalable to a large number of tasks. Our proposed method outperforms the shallow and deep regression models for gene expression inference and alternative multitask learning algorithms on two large-scale datasets regardless of the network architecture.

Citing Articles

A Regularized Multi-Task Learning Approach for Cell Type Detection in Single-Cell RNA Sequencing Data.

Upadhyay P, Ray S Front Genet. 2022; 13:788832.

PMID: 35495159 PMC: 9043858. DOI: 10.3389/fgene.2022.788832.

References
1.
Richiardi J, Altmann A, Milazzo A, Chang C, Chakravarty M, Banaschewski T . BRAIN NETWORKS. Correlated gene expression supports synchronous activity in brain networks. Science. 2015; 348(6240):1241-4. PMC: 4829082. DOI: 10.1126/science.1255905. View

2.
Chen Y, Li Y, Narayan R, Subramanian A, Xie X . Gene expression inference with deep learning. Bioinformatics. 2016; 32(12):1832-9. PMC: 4908320. DOI: 10.1093/bioinformatics/btw074. View

3.
J van t Veer L, Dai H, van de Vijver M, He Y, Hart A, Mao M . Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002; 415(6871):530-6. DOI: 10.1038/415530a. View

4.
Rees M, Seashore-Ludlow B, Cheah J, Adams D, Price E, Gill S . Correlating chemical sensitivity and basal gene expression reveals mechanism of action. Nat Chem Biol. 2015; 12(2):109-16. PMC: 4718762. DOI: 10.1038/nchembio.1986. View

5.
Ntranos V, Kamath G, Zhang J, Pachter L, Tse D . Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts. Genome Biol. 2016; 17(1):112. PMC: 4881296. DOI: 10.1186/s13059-016-0970-8. View