Shallow Sparsely-Connected Autoencoders for Gene Set Projection

Overview

Journal Pac Symp Biocomput

Publisher World Scientific

Specialty Biology

Date 2019 Apr 10

PMID 30963076

Citations 8

Authors

Maxwell P Gold

Alexander Lenail

Ernest Fraenkel

Affiliations

Soon will be listed here.

Abstract

When analyzing biological data, it can be helpful to consider gene sets, or predefined groups of biologically related genes. Methods exist for identifying gene sets that are differential between conditions, but large public datasets from consortium projects and single-cell RNA-Sequencing have opened the door for gene set analysis using more sophisticated machine learning techniques, such as autoencoders and variational autoencoders. We present shallow sparsely-connected autoencoders (SSCAs) and variational autoencoders (SSCVAs) as tools for projecting gene-level data onto gene sets. We tested these approaches on single-cell RNA-Sequencing data from blood cells and on RNA-Sequencing data from breast cancer patients. Both SSCA and SSCVA can recover known biological features from these datasets and the SSCVA method often outperforms SSCA (and six existing gene set scoring algorithms) on classification and prediction tasks.

Citing Articles

NetActivity enhances transcriptional signals by combining gene expression into robust gene set activity scores through interpretable autoencoders.

Ruiz-Arenas C, Marin-Goni I, Wang L, Ochoa I, Perez-Jurado L, Hernaez M Nucleic Acids Res. 2024; 52(9):e44.

PMID: 38597610 PMC: 11109970. DOI: 10.1093/nar/gkae197.

Assessment of emerging pretraining strategies in interpretable multimodal deep learning for cancer prognostication.

Azher Z, Suvarna A, Chen J, Zhang Z, Christensen B, Salas L BioData Min. 2023; 16(1):23.

PMID: 37481666 PMC: 10363299. DOI: 10.1186/s13040-023-00338-w.

Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review.

Brendel M, Su C, Bai Z, Zhang H, Elemento O, Wang F Genomics Proteomics Bioinformatics. 2022; 20(5):814-835.

PMID: 36528240 PMC: 10025684. DOI: 10.1016/j.gpb.2022.11.011.

Artificial neural networks enable genome-scale simulations of intracellular signaling.

Nilsson A, Peters J, Meimetis N, Bryson B, Lauffenburger D Nat Commun. 2022; 13(1):3069.

PMID: 35654811 PMC: 9163072. DOI: 10.1038/s41467-022-30684-y.

Sparsely Connected Autoencoders: A Multi-Purpose Tool for Single Cell omics Analysis.

Alessandri L, Ratto M, Contaldo S, Beccuti M, Cordero F, Arigoni M Int J Mol Sci. 2021; 22(23).

PMID: 34884559 PMC: 8657975. DOI: 10.3390/ijms222312755.

References

Xie R, Wen J, Quitadamo A, Cheng J, Shi X . A deep auto-encoder model for gene expression prediction. BMC Genomics. 2017; 18(Suppl 9):845. PMC: 5773895. DOI: 10.1186/s12864-017-4226-0. View

Lin C, Jain S, Kim H, Bar-Joseph Z . Using neural networks for reducing the dimensions of single-cell RNA-Seq data. Nucleic Acids Res. 2017; 45(17):e156. PMC: 5737331. DOI: 10.1093/nar/gkx681. View

Grossman R, Heath A, Ferretti V, Varmus H, Lowy D, Kibbe W . Toward a Shared Vision for Cancer Genomic Data. N Engl J Med. 2016; 375(12):1109-12. PMC: 6309165. DOI: 10.1056/NEJMp1607591. View

Goudot C, Coillard A, Villani A, Gueguen P, Cros A, Sarkizova S . Aryl Hydrocarbon Receptor Controls Monocyte Differentiation into Dendritic Cells versus Macrophages. Immunity. 2017; 47(3):582-596.e6. DOI: 10.1016/j.immuni.2017.08.016. View

Lee E, Chuang H, Kim J, Ideker T, Lee D . Inferring pathway activity toward precise disease classification. PLoS Comput Biol. 2008; 4(11):e1000217. PMC: 2563693. DOI: 10.1371/journal.pcbi.1000217. View

Lundberg E, Fagerberg L, Klevebring D, Matic I, Geiger T, Cox J . Defining the transcriptome and proteome in three functionally different human cell lines. Mol Syst Biol. 2010; 6:450. PMC: 3018165. DOI: 10.1038/msb.2010.106. View

Ertel A, Dean J, Rui H, Liu C, Witkiewicz A, Knudsen K . RB-pathway disruption in breast cancer: differential association with disease subtypes, disease-specific prognosis and therapeutic response. Cell Cycle. 2010; 9(20):4153-63. PMC: 3055199. DOI: 10.4161/cc.9.20.13454. View

Kang T, Ding W, Zhang L, Ziemek D, Zarringhalam K . A biological network-based regularized artificial neural network model for robust phenotype prediction from gene expression data. BMC Bioinformatics. 2017; 18(1):565. PMC: 5735940. DOI: 10.1186/s12859-017-1984-2. View

Tomfohr J, Lu J, Kepler T . Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics. 2005; 6:225. PMC: 1261155. DOI: 10.1186/1471-2105-6-225. View

10.

Zurauskiene J, Yau C . pcaReduce: hierarchical clustering of single cell transcriptional profiles. BMC Bioinformatics. 2016; 17:140. PMC: 4802652. DOI: 10.1186/s12859-016-0984-y. View

11.

Hanzelmann S, Castelo R, Guinney J . GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics. 2013; 14:7. PMC: 3618321. DOI: 10.1186/1471-2105-14-7. View

12.

Esashi E, Wang Y, Perng O, Qin X, Liu Y, Watowich S . The signal transducer STAT5 inhibits plasmacytoid dendritic cell development by suppressing transcription factor IRF8. Immunity. 2008; 28(4):509-20. PMC: 2864148. DOI: 10.1016/j.immuni.2008.02.013. View

13.

Hallett R, Hassell J . E2F1 and KIAA0191 expression predicts breast cancer patient survival. BMC Res Notes. 2011; 4:95. PMC: 3078871. DOI: 10.1186/1756-0500-4-95. View

14.

Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N . mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods. 2009; 6(5):377-82. DOI: 10.1038/nmeth.1315. View

15.

Huber R, Pietsch D, Panterodt T, Brand K . Regulation of C/EBPβ and resulting functions in cells of the monocytic lineage. Cell Signal. 2012; 24(6):1287-96. DOI: 10.1016/j.cellsig.2012.02.007. View

16.

DeTomaso D, Yosef N . FastProject: a tool for low-dimensional analysis of single-cell RNA-Seq data. BMC Bioinformatics. 2016; 17(1):315. PMC: 4995760. DOI: 10.1186/s12859-016-1176-5. View

17.

Way G, Greene C . Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders. Pac Symp Biocomput. 2017; 23:80-91. PMC: 5728678. View

18.

Vogel C, Marcotte E . Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat Rev Genet. 2012; 13(4):227-32. PMC: 3654667. DOI: 10.1038/nrg3185. View

19.

Subramanian A, Tamayo P, Mootha V, Mukherjee S, Ebert B, Gillette M . Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005; 102(43):15545-50. PMC: 1239896. DOI: 10.1073/pnas.0506580102. View

20.

Barbie D, Tamayo P, Boehm J, Kim S, Moody S, Dunn I . Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature. 2009; 462(7269):108-12. PMC: 2783335. DOI: 10.1038/nature08460. View