» Articles » PMID: 36829667

Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review

Overview
Date 2023 Feb 25
PMID 36829667
Authors
Affiliations
Soon will be listed here.
Abstract

Cancer is a term that denotes a group of diseases caused by the abnormal growth of cells that can spread in different parts of the body. According to the World Health Organization (WHO), cancer is the second major cause of death after cardiovascular diseases. Gene expression can play a fundamental role in the early detection of cancer, as it is indicative of the biochemical processes in tissue and cells, as well as the genetic characteristics of an organism. Deoxyribonucleic acid (DNA) microarrays and ribonucleic acid (RNA)-sequencing methods for gene expression data allow quantifying the expression levels of genes and produce valuable data for computational analysis. This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods. Both conventional and deep learning-based approaches are reviewed, with an emphasis on the application of deep learning models due to their comparative advantages for identifying gene patterns that are distinctive for various types of cancers. Relevant works that employ the most commonly used deep neural network architectures are covered, including multi-layer perceptrons, as well as convolutional, recurrent, graph, and transformer networks. This survey also presents an overview of the data collection methods for gene expression analysis and lists important datasets that are commonly used for supervised machine learning for this task. Furthermore, we review pertinent techniques for feature engineering and data preprocessing that are typically used to handle the high dimensionality of gene expression data, caused by a large number of genes present in data samples. The paper concludes with a discussion of future research directions for machine learning-based gene expression analysis for cancer classification.

Citing Articles

Hallmarks of artificial intelligence contributions to precision oncology.

Chang T, Park S, Schaffer A, Jiang P, Ruppin E Nat Cancer. 2025; .

PMID: 40055572 DOI: 10.1038/s43018-025-00917-2.


Breast cancer prediction based on gene expression data using interpretable machine learning techniques.

Kallah-Dagadu G, Mohammed M, Nasejje J, Mchunu N, Twabi H, Batidzirai J Sci Rep. 2025; 15(1):7594.

PMID: 40038307 PMC: 11880515. DOI: 10.1038/s41598-025-85323-5.


Inhibition of CDC27 O-GlcNAcylation coordinates the antitumor efficacy in multiple myeloma through the autophagy-lysosome pathway.

Wu H, Qin R, Li W, Liu J, Deng C, Zheng Z Acta Pharmacol Sin. 2025; .

PMID: 39984622 DOI: 10.1038/s41401-025-01500-2.


A multi-classification deep neural network for cancer type identification from high-dimension, small-sample and imbalanced gene microarray data.

Zeng Y, Zhang Y, Xiao Z, Sui H Sci Rep. 2025; 15(1):5239.

PMID: 39939378 PMC: 11822135. DOI: 10.1038/s41598-025-89475-2.


A comparative analysis of gene expression profiling by statistical and machine learning approaches.

Bontonou M, Haget A, Boulougouri M, Audit B, Borgnat P, Arbona J Bioinform Adv. 2025; 5(1):vbae199.

PMID: 39897946 PMC: 11783302. DOI: 10.1093/bioadv/vbae199.


References
1.
Garrido-Castro A, Lin N, Polyak K . Insights into Molecular Classifications of Triple-Negative Breast Cancer: Improving Patient Selection for Treatment. Cancer Discov. 2019; 9(2):176-198. PMC: 6387871. DOI: 10.1158/2159-8290.CD-18-1177. View

2.
Chen J, Dhahbi J . Lung adenocarcinoma and lung squamous cell carcinoma cancer classification, biomarker identification, and gene expression analysis using overlapping feature selection methods. Sci Rep. 2021; 11(1):13323. PMC: 8233431. DOI: 10.1038/s41598-021-92725-8. View

3.
Maudsley S, Chadwick W, Wang L, Zhou Y, Martin B, Park S . Bioinformatic approaches to metabolic pathways analysis. Methods Mol Biol. 2011; 756:99-130. PMC: 4698828. DOI: 10.1007/978-1-61779-160-4_5. View

4.
Zhang Y, Chen J, Lin Y, Chan S, Zhou J, Chow D . Prediction of breast cancer molecular subtypes on DCE-MRI using convolutional neural network with transfer learning between two centers. Eur Radiol. 2020; 31(4):2559-2567. PMC: 8547260. DOI: 10.1007/s00330-020-07274-x. View

5.
Crosby D, Bhatia S, Brindle K, Coussens L, Dive C, Emberton M . Early detection of cancer. Science. 2022; 375(6586):eaay9040. DOI: 10.1126/science.aay9040. View