» Articles » PMID: 34359669

Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles

Overview
Journal Cancers (Basel)
Publisher MDPI
Specialty Oncology
Date 2021 Aug 7
PMID 34359669
Citations 13
Authors
Affiliations
Soon will be listed here.
Abstract

Metastatic cancers account for up to 90% of cancer-related deaths. The clear differentiation of metastatic cancers from primary cancers is crucial for cancer type identification and developing targeted treatment for each cancer type. DNA methylation patterns are suggested to be an intriguing target for cancer prediction and are also considered to be an important mediator for the transition to metastatic cancer. In the present study, we used 24 cancer types and 9303 methylome samples downloaded from publicly available data repositories, including The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO). We constructed machine learning classifiers to discriminate metastatic, primary, and non-cancerous methylome samples. We applied support vector machines (SVM), Naive Bayes (NB), extreme gradient boosting (XGBoost), and random forest (RF) machine learning models to classify the cancer types based on their tissue of origin. RF outperformed the other classifiers, with an average accuracy of 99%. Moreover, we applied local interpretable model-agnostic explanations (LIME) to explain important methylation biomarkers to classify cancer types.

Citing Articles

Accurate identification of primary site in tumors of unknown origin (TUO) using DNA methylation.

Duckett D, Vormittag-Nocito E, Jamshidi P, Sukhanova M, Parker S, Brat D NPJ Precis Oncol. 2025; 9(1):8.

PMID: 39789204 PMC: 11718252. DOI: 10.1038/s41698-025-00805-z.


Evaluation of agreement between common clustering strategies for DNA methylation-based subtyping of breast tumours.

Zarean E, Li S, Wong E, Makalic E, Milne R, Giles G Epigenomics. 2024; 17(2):105-114.

PMID: 39711216 PMC: 11792870. DOI: 10.1080/17501911.2024.2441653.


Gut Microbiota as Mediator and Moderator Between Hepatitis B Virus and Hepatocellular Carcinoma: A Prospective Study.

Hu B, Yang Y, Yao J, Lin G, He Q, Bo Z Cancer Med. 2024; 13(24):e70454.

PMID: 39702929 PMC: 11659115. DOI: 10.1002/cam4.70454.


PathMethy: an interpretable AI framework for cancer origin tracing based on DNA methylation.

Xie J, Song Y, Zheng H, Luo S, Chen Y, Zhang C Brief Bioinform. 2024; 25(6).

PMID: 39391931 PMC: 11467402. DOI: 10.1093/bib/bbae497.


Stanniocalcin Protein Expression in Female Reproductive Organs: Literature Review and Public Cancer Database Analysis.

Khatun M, Modhukur V, Piltonen T, Tapanainen J, Salumets A Endocrinology. 2024; 165(10).

PMID: 39186548 PMC: 11398916. DOI: 10.1210/endocr/bqae110.


References
1.
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R . Missing value estimation methods for DNA microarrays. Bioinformatics. 2001; 17(6):520-5. DOI: 10.1093/bioinformatics/17.6.520. View

2.
Bhatlekar S, Fields J, Boman B . HOX genes and their role in the development of human cancers. J Mol Med (Berl). 2014; 92(8):811-23. DOI: 10.1007/s00109-014-1181-y. View

3.
Wang Y, Karlsson R, Jylhava J, Hedman A, Almqvist C, Karlsson I . Comprehensive longitudinal study of epigenetic mutations in aging. Clin Epigenetics. 2019; 11(1):187. PMC: 6902582. DOI: 10.1186/s13148-019-0788-9. View

4.
Scheel C, Weinberg R . Cancer stem cells and epithelial-mesenchymal transition: concepts and molecular links. Semin Cancer Biol. 2012; 22(5-6):396-403. PMC: 6220425. DOI: 10.1016/j.semcancer.2012.04.001. View

5.
Pruitt K, Maglott D . RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 2000; 29(1):137-40. PMC: 29787. DOI: 10.1093/nar/29.1.137. View