» Articles » PMID: 31729414

CPEM: Accurate Cancer Type Classification Based on Somatic Alterations Using an Ensemble of A random Forest and a Deep Neural Network

Overview
Journal Sci Rep
Specialty Science
Date 2019 Nov 16
PMID 31729414
Citations 15
Authors
Affiliations
Soon will be listed here.
Abstract

With recent advances in DNA sequencing technologies, fast acquisition of large-scale genomic data has become commonplace. For cancer studies, in particular, there is an increasing need for the classification of cancer type based on somatic alterations detected from sequencing analyses. However, the ever-increasing size and complexity of the data make the classification task extremely challenging. In this study, we evaluate the contributions of various input features, such as mutation profiles, mutation rates, mutation spectra and signatures, and somatic copy number alterations that can be derived from genomic data, and further utilize them for accurate cancer type classification. We introduce a novel ensemble of machine learning classifiers, called CPEM (Cancer Predictor using an Ensemble Model), which is tested on 7,002 samples representing over 31 different cancer types collected from The Cancer Genome Atlas (TCGA) database. We first systematically examined the impact of the input features. Features known to be associated with specific cancers had relatively high importance in our initial prediction model. We further investigated various machine learning classifiers and feature selection methods to derive the ensemble-based cancer type prediction model achieving up to 84% classification accuracy in the nested 10-fold cross-validation. Finally, we narrowed down the target cancers to the six most common types and achieved up to 94% accuracy.

Citing Articles

Classification performance assessment for imbalanced multiclass data.

Aguilar-Ruiz J, Michalak M Sci Rep. 2024; 14(1):10759.

PMID: 38730045 PMC: 11087593. DOI: 10.1038/s41598-024-61365-z.


Deep-Learning Model for Tumor-Type Prediction Using Targeted Clinical Genomic Sequencing Data.

Darmofal M, Suman S, Atwal G, Toomey M, Chen J, Chang J Cancer Discov. 2024; 14(6):1064-1081.

PMID: 38416134 PMC: 11145170. DOI: 10.1158/2159-8290.CD-23-0996.


Integrative analyses and validation of ferroptosis-related genes and mechanisms associated with cerebrovascular and cardiovascular ischemic diseases.

Liao W, Wen Y, Zeng C, Yang S, Duan Y, He C BMC Genomics. 2023; 24(1):731.

PMID: 38049739 PMC: 10694919. DOI: 10.1186/s12864-023-09829-w.


Classification of tumor types using XGBoost machine learning model: a vector space transformation of genomic alterations.

Zelli V, Manno A, Compagnoni C, Ibraheem R, Zazzeroni F, Alesse E J Transl Med. 2023; 21(1):836.

PMID: 37990214 PMC: 10664515. DOI: 10.1186/s12967-023-04720-4.


Mutation-Attention (MuAt): deep representation learning of somatic mutations for tumour typing and subtyping.

Sanjaya P, Maljanen K, Katainen R, Waszak S, Aaltonen L, Stegle O Genome Med. 2023; 15(1):47.

PMID: 37420249 PMC: 10326961. DOI: 10.1186/s13073-023-01204-4.


References
1.
Lawrence M, Stojanov P, Polak P, Kryukov G, Cibulskis K, Sivachenko A . Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013; 499(7457):214-218. PMC: 3919509. DOI: 10.1038/nature12213. View

2.
Kranenburg O . The KRAS oncogene: past, present, and future. Biochim Biophys Acta. 2005; 1756(2):81-2. DOI: 10.1016/j.bbcan.2005.10.001. View

3.
Forbes S, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H . COSMIC: exploring the world's knowledge of somatic mutations in human cancer. Nucleic Acids Res. 2014; 43(Database issue):D805-11. PMC: 4383913. DOI: 10.1093/nar/gku1075. View

4.
Cohen J, Li L, Wang Y, Thoburn C, Afsari B, Danilova L . Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science. 2018; 359(6378):926-930. PMC: 6080308. DOI: 10.1126/science.aar3247. View

5.
Kourou K, Exarchos T, Exarchos K, Karamouzis M, Fotiadis D . Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J. 2015; 13:8-17. PMC: 4348437. DOI: 10.1016/j.csbj.2014.11.005. View