» Articles » PMID: 32649756

Imputing Missing RNA-sequencing Data from DNA Methylation by Using a Transfer Learning-based Neural Network

Overview
Journal Gigascience
Specialties Biology
Genetics
Date 2020 Jul 11
PMID 32649756
Citations 22
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Gene expression plays a key intermediate role in linking molecular features at the DNA level and phenotype. However, owing to various limitations in experiments, the RNA-seq data are missing in many samples while there exist high-quality of DNA methylation data. Because DNA methylation is an important epigenetic modification to regulate gene expression, it can be used to predict RNA-seq data. For this purpose, many methods have been developed. A common limitation of these methods is that they mainly focus on a single cancer dataset and do not fully utilize information from large pan-cancer datasets.

Results: Here, we have developed a novel method to impute missing gene expression data from DNA methylation data through a transfer learning-based neural network, namely, TDimpute. In the method, the pan-cancer dataset from The Cancer Genome Atlas (TCGA) was utilized for training a general model, which was then fine-tuned on the specific cancer dataset. By testing on 16 cancer datasets, we found that our method significantly outperforms other state-of-the-art methods in imputation accuracy with a 7-11% improvement under different missing rates. The imputed gene expression was further proved to be useful for downstream analyses, including the identification of both methylation-driving and prognosis-related genes, clustering analysis, and survival analysis on the TCGA dataset. More importantly, our method was indicated to be useful for general purposes by an independent test on the Wilms tumor dataset from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) project.

Conclusions: TDimpute is an effective method for RNA-seq imputation with limited training samples.

Citing Articles

Optimizing multi-omics data imputation with NMF and GAN synergy.

Ansari M, Ahmed K, Zhang W Bioinformatics. 2024; 40(11).

PMID: 39546381 PMC: 11639186. DOI: 10.1093/bioinformatics/btae674.


Artificial intelligence applied to 'omics data in liver disease: towards a personalised approach for diagnosis, prognosis and treatment.

Ghosh S, Zhao X, Alim M, Brudno M, Bhat M Gut. 2024; 74(2):295-311.

PMID: 39174307 PMC: 11874365. DOI: 10.1136/gutjnl-2023-331740.


Data Augmentation with Cross-Modal Variational Autoencoders (DACMVA) for Cancer Survival Prediction.

Rajaram S, Mitchell C Information (Basel). 2024; 15(1).

PMID: 38665395 PMC: 11044918. DOI: 10.3390/info15010007.


CLCLSA: Cross-omics linked embedding with contrastive learning and self attention for integration with incomplete multi-omics data.

Zhao C, Liu A, Zhang X, Cao X, Ding Z, Sha Q Comput Biol Med. 2024; 170:108058.

PMID: 38295477 PMC: 10959569. DOI: 10.1016/j.compbiomed.2024.108058.


Novel research and future prospects of artificial intelligence in cancer diagnosis and treatment.

Zhang C, Xu J, Tang R, Yang J, Wang W, Yu X J Hematol Oncol. 2023; 16(1):114.

PMID: 38012673 PMC: 10680201. DOI: 10.1186/s13045-023-01514-5.


References
1.
Hoadley K, Yau C, Wolf D, Cherniack A, Tamborero D, Ng S . Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell. 2014; 158(4):929-944. PMC: 4152462. DOI: 10.1016/j.cell.2014.06.049. View

2.
Yousefi S, Amrollahi F, Amgad M, Dong C, Lewis J, Song C . Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models. Sci Rep. 2017; 7(1):11707. PMC: 5601479. DOI: 10.1038/s41598-017-11817-6. View

3.
Wang D, Lv Y, Guo Z, Li X, Li Y, Zhu J . Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules. Bioinformatics. 2006; 22(23):2883-9. DOI: 10.1093/bioinformatics/btl339. View

4.
Vivian J, Rao A, Nothaft F, Ketchum C, Armstrong J, Novak A . Toil enables reproducible, open source, big biomedical data analyses. Nat Biotechnol. 2017; 35(4):314-316. PMC: 5546205. DOI: 10.1038/nbt.3772. View

5.
Xu J, Zhao L, Liu D, Hu S, Song X, Li J . EWAS: epigenome-wide association study software 2.0. Bioinformatics. 2018; 34(15):2657-2658. PMC: 6061808. DOI: 10.1093/bioinformatics/bty163. View