» Articles » PMID: 26539502

Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations

Overview
Journal Biomed Res Int
Publisher Wiley
Date 2015 Nov 6
PMID 26539502
Citations 7
Authors
Affiliations
Soon will be listed here.
Abstract

An accurate classification of human cancer, including its primary site, is important for better understanding of cancer and effective therapeutic strategies development. The available big data of somatic mutations provides us a great opportunity to investigate cancer classification using machine learning. Here, we explored the patterns of 1,760,846 somatic mutations identified from 230,255 cancer patients along with gene function information using support vector machine. Specifically, we performed a multiclass classification experiment over the 17 tumor sites using the gene symbol, somatic mutation, chromosome, and gene functional pathway as predictors for 6,751 subjects. The performance of the baseline using only gene features is 0.57 in accuracy. It was improved to 0.62 when adding the information of mutation and chromosome. Among the predictable primary tumor sites, the prediction of five primary sites (large intestine, liver, skin, pancreas, and lung) could achieve the performance with more than 0.70 in F-measure. The model of the large intestine ranked the first with 0.87 in F-measure. The results demonstrate that the somatic mutation information is useful for prediction of primary tumor sites with machine learning modeling. To our knowledge, this study is the first investigation of the primary sites classification using machine learning and somatic mutation data.

Citing Articles

Classification of tumor types using XGBoost machine learning model: a vector space transformation of genomic alterations.

Zelli V, Manno A, Compagnoni C, Ibraheem R, Zazzeroni F, Alesse E J Transl Med. 2023; 21(1):836.

PMID: 37990214 PMC: 10664515. DOI: 10.1186/s12967-023-04720-4.


Secure tumor classification by shallow neural network using homomorphic encryption.

Hong S, Park J, Cho W, Choe H, Cheon J BMC Genomics. 2022; 23(1):284.

PMID: 35395714 PMC: 8994372. DOI: 10.1186/s12864-022-08469-w.


Deep learning in cancer diagnosis, prognosis and treatment selection.

Tran K, Kondrashova O, Bradley A, Williams E, Pearson J, Waddell N Genome Med. 2021; 13(1):152.

PMID: 34579788 PMC: 8477474. DOI: 10.1186/s13073-021-00968-x.


Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods.

He Z, Zhang J, Yuan X, Zhang Y Front Genet. 2021; 11:632901.

PMID: 33537063 PMC: 7848170. DOI: 10.3389/fgene.2020.632901.


A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns.

Jiao W, Atwal G, Polak P, Karlic R, Cuppen E, Danyi A Nat Commun. 2020; 11(1):728.

PMID: 32024849 PMC: 7002586. DOI: 10.1038/s41467-019-13825-8.


References
1.
Pleasance E, Cheetham R, Stephens P, McBride D, Humphray S, Greenman C . A comprehensive catalogue of somatic mutations from a human cancer genome. Nature. 2009; 463(7278):191-6. PMC: 3145108. DOI: 10.1038/nature08658. View

2.
Kanehisa M, Goto S . KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 1999; 28(1):27-30. PMC: 102409. DOI: 10.1093/nar/28.1.27. View

3.
Lehmann B, Bauer J, Chen X, Sanders M, Chakravarthy A, Shyr Y . Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies. J Clin Invest. 2011; 121(7):2750-67. PMC: 3127435. DOI: 10.1172/JCI45014. View

4.
Hindorff L, Gillanders E, Manolio T . Genetic architecture of cancer and other complex diseases: lessons learned and future directions. Carcinogenesis. 2011; 32(7):945-54. PMC: 3140138. DOI: 10.1093/carcin/bgr056. View

5.
Jia P, Wang Q, Chen Q, Hutchinson K, Pao W, Zhao Z . MSEA: detection and quantification of mutation hotspots through mutation set enrichment analysis. Genome Biol. 2014; 15(10):489. PMC: 4226881. DOI: 10.1186/s13059-014-0489-9. View