» Articles » PMID: 35690622

A Deep Learning Model to Classify Neoplastic State and Tissue Origin from Transcriptomic Data

Overview
Journal Sci Rep
Specialty Science
Date 2022 Jun 11
PMID 35690622
Authors
Affiliations
Soon will be listed here.
Abstract

Application of deep learning methods to transcriptomic data has the potential to enhance the accuracy and efficiency of tissue classification and cell state identification. Herein, we developed a multitask deep learning model for tissue classification combining publicly available whole transcriptomic (RNA-seq) datasets of non-neoplastic, neoplastic and peri-neoplastic tissue to classify disease state, tissue origin and neoplastic subclass. RNA-seq data from a total of 10,116 patient samples processed through a common pipeline were used for model training and validation. The model achieved 99% accuracy for disease state classification (ROC-AUC of 0.98) and 97% accuracy for tissue origin (ROC-AUC of 0.99). Moreover, the model achieved an accuracy of 92% (ROC-AUC 0.95) for neoplastic subclassification. This is the first multitask deep learning algorithm developed for tissue classification employing a uniform pipeline analysis of transcriptomic data with multiple tissue classifiers. This model serves as a framework for incorporating large transcriptomic datasets across conditions to facilitate clinical diagnosis and cell-based treatment strategies.

Citing Articles

A variational autoencoder trained with priors from canonical pathways increases the interpretability of transcriptome data.

Liu B, Rosenhahn B, Illig T, DeLuca D PLoS Comput Biol. 2024; 20(7):e1011198.

PMID: 38959284 PMC: 11251626. DOI: 10.1371/journal.pcbi.1011198.


New techniques to identify the tissue of origin for cancer of unknown primary in the era of precision medicine: progress and challenges.

Ma W, Wu H, Chen Y, Xu H, Jiang J, Du B Brief Bioinform. 2024; 25(2).

PMID: 38343328 PMC: 10859692. DOI: 10.1093/bib/bbae028.


Machine learning for pan-cancer classification based on RNA sequencing data.

Stancl P, Karlic R Front Mol Biosci. 2023; 10:1285795.

PMID: 38028533 PMC: 10667476. DOI: 10.3389/fmolb.2023.1285795.


The practical utility of AI-assisted molecular profiling in the diagnosis and management of cancer of unknown primary: an updated review.

Lorkowski S, Dermawan J, Rubin B Virchows Arch. 2023; 484(2):369-375.

PMID: 37999736 DOI: 10.1007/s00428-023-03708-1.

References
1.
Dermawan J, Rubin B . The role of molecular profiling in the diagnosis and management of metastatic undifferentiated cancer of unknown primary: Molecular profiling of metastatic cancer of unknown primary. Semin Diagn Pathol. 2020; 38(6):193-198. DOI: 10.1053/j.semdp.2020.12.001. View

2.
Feng H, Zhang X, Zhang C . mRIN for direct assessment of genome-wide and gene-specific mRNA integrity from large-scale RNA-sequencing data. Nat Commun. 2015; 6:7816. PMC: 4523900. DOI: 10.1038/ncomms8816. View

3.
Leek J . svaseq: removing batch effects and other unwanted noise from sequencing data. Nucleic Acids Res. 2014; 42(21). PMC: 4245966. DOI: 10.1093/nar/gku864. View

4.
Cahan P, Li H, Morris S, Lummertz da Rocha E, Daley G, Collins J . CellNet: network biology applied to stem cell engineering. Cell. 2014; 158(4):903-915. PMC: 4233680. DOI: 10.1016/j.cell.2014.07.020. View

5.
Xu Q, Chen J, Ni S, Tan C, Xu M, Dong L . Pan-cancer transcriptome analysis reveals a gene expression signature for the identification of tumor tissue origin. Mod Pathol. 2016; 29(6):546-56. DOI: 10.1038/modpathol.2016.60. View