» Articles » PMID: 33039710

CUP-AI-Dx: A Tool for Inferring Cancer Tissue of Origin and Molecular Subtype Using RNA Gene-expression Data and Artificial Intelligence

Abstract

Background: Cancer of unknown primary (CUP), representing approximately 3-5% of all malignancies, is defined as metastatic cancer where a primary site of origin cannot be found despite a standard diagnostic workup. Because knowledge of a patient's primary cancer remains fundamental to their treatment, CUP patients are significantly disadvantaged and most have a poor survival outcome. Developing robust and accessible diagnostic methods for resolving cancer tissue of origin, therefore, has significant value for CUP patients.

Methods: We developed an RNA-based classifier called CUP-AI-Dx that utilizes a 1D Inception convolutional neural network (1D-Inception) model to infer a tumor's primary tissue of origin. CUP-AI-Dx was trained using the transcriptional profiles of 18,217 primary tumours representing 32 cancer types from The Cancer Genome Atlas project (TCGA) and International Cancer Genome Consortium (ICGC). Gene expression data was ordered by gene chromosomal coordinates as input to the 1D-CNN model, and the model utilizes multiple convolutional kernels with different configurations simultaneously to improve generality. The model was optimized through extensive hyperparameter tuning, including different max-pooling layers and dropout settings. For 11 tumour types, we also developed a random forest model that can classify the tumour's molecular subtype according to prior TCGA studies. The optimised CUP-AI-Dx tissue of origin classifier was tested on 394 metastatic samples from 11 tumour types from TCGA and 92 formalin-fixed paraffin-embedded (FFPE) samples representing 18 cancer types from two clinical laboratories. The CUP-AI-Dx molecular subtype was also independently tested on independent ovarian and breast cancer microarray datasets FINDINGS: CUP-AI-Dx identifies the primary site with an overall top-1-accuracy of 98.54% in cross-validation and 96.70% on a test dataset. When applied to two independent clinical-grade RNA-seq datasets generated from two different institutes from the US and Australia, our model predicted the primary site with a top-1-accuracy of 86.96% and 72.46% respectively.

Interpretation: The CUP-AI-Dx predicts tumour primary site and molecular subtype with high accuracy and therefore can be used to assist the diagnostic work-up of cancers of unknown primary or uncertain origin using a common and accessible genomics platform.

Funding: NIH R35 GM133562, NCI P30 CA034196, Victorian Cancer Agency Australia.

Citing Articles

Clinical Applications of Artificial Intelligence (AI) in Human Cancer: Is It Time to Update the Diagnostic and Predictive Models in Managing Hepatocellular Carcinoma (HCC)?.

Romeo M, Dallio M, Napolitano C, Basile C, Di Nardo F, Vaia P Diagnostics (Basel). 2025; 15(3).

PMID: 39941182 PMC: 11817573. DOI: 10.3390/diagnostics15030252.


Seeing the primary tumor because of all the trees: Cancer type prediction on low-dimensional data.

Gehrmann J, Soenarto D, Hidayat K, Beyer M, Quakulinski L, Alkarkoukly S Front Med (Lausanne). 2024; 11:1396459.

PMID: 39257886 PMC: 11385615. DOI: 10.3389/fmed.2024.1396459.


XENTURION is a population-level multidimensional resource of xenografts and tumoroids from metastatic colorectal cancer patients.

Leto S, Grassi E, Avolio M, Vurchio V, Cottino F, Ferri M Nat Commun. 2024; 15(1):7495.

PMID: 39209908 PMC: 11362617. DOI: 10.1038/s41467-024-51909-2.


Occlusion enhanced pan-cancer classification via deep learning.

Zhao X, Chen Z, Wang H, Sun H BMC Bioinformatics. 2024; 25(1):260.

PMID: 39118043 PMC: 11308240. DOI: 10.1186/s12859-024-05870-y.


Application of Transcriptome-Based Gene Set Featurization for Machine Learning Model to Predict the Origin of Metastatic Cancer.

Jeong Y, Chu J, Kang J, Baek S, Lee J, Jung D Curr Issues Mol Biol. 2024; 46(7):7291-7302.

PMID: 39057073 PMC: 11276602. DOI: 10.3390/cimb46070432.


References
1.
Varadhachary G, Abbruzzese J, Lenzi R . Diagnostic strategies for unknown primary cancer. Cancer. 2004; 100(9):1776-85. DOI: 10.1002/cncr.20202. View

2.
Benvenuti S, Milan M, Geuna E, Pisacane A, Senetta R, Gambardella G . Cancer of Unknown Primary (CUP): genetic evidence for a novel nosological entity? A case report. EMBO Mol Med. 2020; 12(7):e11756. PMC: 7338804. DOI: 10.15252/emmm.201911756. View

3.
Ceccarelli M, Barthel F, Malta T, Sabedot T, Salama S, Murray B . Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma. Cell. 2016; 164(3):550-63. PMC: 4754110. DOI: 10.1016/j.cell.2015.12.028. View

4.
. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature. 2013; 499(7456):43-9. PMC: 3771322. DOI: 10.1038/nature12222. View

5.
Wu F, Huang D, Wang L, Xu Q, Liu F, Ye X . 92-Gene molecular profiling in identification of cancer origin: a retrospective study in Chinese population and performance within different subgroups. PLoS One. 2012; 7(6):e39320. PMC: 3382214. DOI: 10.1371/journal.pone.0039320. View