Integrating Multi-omics Data Through Deep Learning for Accurate Cancer Prognosis Prediction
Overview
General Medicine
Medical Informatics
Affiliations
Background: Genomic information is nowadays widely used for precise cancer treatments. Since the individual type of omics data only represents a single view that suffers from data noise and bias, multiple types of omics data are required for accurate cancer prognosis prediction. However, it is challenging to effectively integrate multi-omics data due to the large number of redundant variables but relatively small sample size. With the recent progress in deep learning techniques, Autoencoder was used to integrate multi-omics data for extracting representative features. Nevertheless, the generated model is fragile from data noises. Additionally, previous studies usually focused on individual cancer types without making comprehensive tests on pan-cancer. Here, we employed the denoising Autoencoder to get a robust representation of the multi-omics data, and then used the learned representative features to estimate patients' risks.
Results: By applying to 15 cancers from The Cancer Genome Atlas (TCGA), our method was shown to improve the C-index values over previous methods by 6.5% on average. Considering the difficulty to obtain multi-omics data in practice, we further used only mRNA data to fit the estimated risks by training XGboost models, and found the models could achieve an average C-index value of 0.627. As a case study, the breast cancer prognosis prediction model was independently tested on three datasets from the Gene Expression Omnibus (GEO), and shown able to significantly separate high-risk patients from low-risk ones (C-index>0.6, p-values<0.05). Based on the risk subgroups divided by our method, we identified nine prognostic markers highly associated with breast cancer, among which seven genes have been proved by literature review.
Conclusion: Our comprehensive tests indicated that we have constructed an accurate and robust framework to integrate multi-omics data for cancer prognosis prediction. Moreover, it is an effective way to discover cancer prognosis-related genes.
Mao R, Wan L, Zhou M, Li D Brief Bioinform. 2025; 26(2).
PMID: 40067266 PMC: 11894944. DOI: 10.1093/bib/bbaf108.
A generative deep neural network for pan-digestive tract cancer survival analysis.
Xu L, Lan T, Huang Y, Wang L, Lin J, Song X BioData Min. 2025; 18(1):9.
PMID: 39871331 PMC: 11771125. DOI: 10.1186/s13040-025-00426-z.
Tanaka M Biomedicines. 2025; 13(1).
PMID: 39857751 PMC: 11761901. DOI: 10.3390/biomedicines13010167.
MulitDeepsurv: survival analysis of gastric cancer based on deep learning multimodal fusion models.
Mao S, Liu J Biomed Opt Express. 2025; 16(1):126-141.
PMID: 39816158 PMC: 11729289. DOI: 10.1364/BOE.541570.
Lin J, Deng W, Wei J, Zheng J, Chen K, Chai H J Cell Mol Med. 2024; 28(23):e70221.
PMID: 39628446 PMC: 11615516. DOI: 10.1111/jcmm.70221.