» Articles » PMID: 30662451

Embedding of Genes Using Cancer Gene Expression Data: Biological Relevance and Potential Application on Biomarker Discovery

Overview
Journal Front Genet
Date 2019 Jan 22
PMID 30662451
Citations 19
Authors
Affiliations
Soon will be listed here.
Abstract

Artificial neural networks (ANNs) have been utilized for classification and prediction task with remarkable accuracy. However, its implications for unsupervised data mining using molecular data is under-explored. We found that embedding can extract biologically relevant information from The Cancer Genome Atlas (TCGA) gene expression dataset by learning a vector representation through gene co-occurrence. Ground truth relationship, such as cancer types of the input sample and semantic meaning of genes, were showed to retain in the resulting entity matrices. We also demonstrated the interpretability and usage of these matrices in shortlisting candidates from a long gene list as in the case of immunotherapy response. 73 related genes are singled out while the relatedness of 55 genes with immune checkpoint proteins (PD-1, PD-L1, and CTLA-4) are supported by literature. 16 novel genes () related to immune checkpoint proteins were identified. Thus, this method is feasible to mine big volume of biological data, and embedding would be a valuable tool to discover novel knowledge from omics data. The resulting embedding matrices mined from TCGA gene expression data are interactively explorable online (http://bit.ly/tcga-embedding-cancer) and could serve as an informative reference for gene relatedness in the context of cancer and is readily applicable to biomarker discovery of any molecular targeted therapy.

Citing Articles

Untangling the Context-Specificity of Essential Genes by Means of Machine Learning: A Constructive Experience.

Giordano M, Falbo E, Maddalena L, Piccirillo M, Granata I Biomolecules. 2024; 14(1).

PMID: 38254618 PMC: 10813179. DOI: 10.3390/biom14010018.


Molecular data representation based on gene embeddings for cancer drug response prediction.

Park S, Lee H Sci Rep. 2023; 13(1):21898.

PMID: 38081928 PMC: 10713675. DOI: 10.1038/s41598-023-49003-6.


Biology-aware mutation-based deep learning for outcome prediction of cancer immunotherapy with immune checkpoint inhibitors.

Liu J, Islam M, Sang S, Qiu L, Xing L NPJ Precis Oncol. 2023; 7(1):117.

PMID: 37932419 PMC: 10628135. DOI: 10.1038/s41698-023-00468-8.


Domain-PFP allows protein function prediction using function-aware domain embedding representations.

Ibtehaz N, Kagaya Y, Kihara D Commun Biol. 2023; 6(1):1103.

PMID: 37907681 PMC: 10618451. DOI: 10.1038/s42003-023-05476-9.


Auxiliary Diagnosis and Prognostic Value of Dehydrogenase/Reductase 2 (DHRS2) in Various Tumors.

An Z, Bo W, Qin J, Jiang L, Jiang J Iran J Public Health. 2023; 52(6):1150-1160.

PMID: 37484140 PMC: 10362825. DOI: 10.18502/ijph.v52i6.12957.


References
1.
LeCun Y, Bengio Y, Hinton G . Deep learning. Nature. 2015; 521(7553):436-44. DOI: 10.1038/nature14539. View

2.
Merico D, Isserlin R, Stueker O, Emili A, Bader G . Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PLoS One. 2010; 5(11):e13984. PMC: 2981572. DOI: 10.1371/journal.pone.0013984. View

3.
Khosravi P, Kazemi E, Imielinski M, Elemento O, Hajirasouliha I . Deep Convolutional Neural Networks Enable Discrimination of Heterogeneous Digital Pathology Images. EBioMedicine. 2018; 27:317-328. PMC: 5828543. DOI: 10.1016/j.ebiom.2017.12.026. View

4.
Silver D, Huang A, Maddison C, Guez A, Sifre L, Van Den Driessche G . Mastering the game of Go with deep neural networks and tree search. Nature. 2016; 529(7587):484-9. DOI: 10.1038/nature16961. View

5.
Du J, Jia P, Dai Y, Tao C, Zhao Z, Zhi D . Gene2vec: distributed representation of genes based on co-expression. BMC Genomics. 2019; 20(Suppl 1):82. PMC: 6360648. DOI: 10.1186/s12864-018-5370-x. View