» Articles » PMID: 30646919

A Novel Gene Selection Algorithm for Cancer Classification Using Microarray Datasets

Overview
Publisher Biomed Central
Specialty Genetics
Date 2019 Jan 17
PMID 30646919
Citations 19
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Microarray datasets are an important medical diagnostic tool as they represent the states of a cell at the molecular level. Available microarray datasets for classifying cancer types generally have a fairly small sample size compared to the large number of genes involved. This fact is known as a curse of dimensionality, which is a challenging problem. Gene selection is a promising approach that addresses this problem and plays an important role in the development of efficient cancer classification due to the fact that only a small number of genes are related to the classification problem. Gene selection addresses many problems in microarray datasets such as reducing the number of irrelevant and noisy genes, and selecting the most related genes to improve the classification results.

Methods: An innovative Gene Selection Programming (GSP) method is proposed to select relevant genes for effective and efficient cancer classification. GSP is based on Gene Expression Programming (GEP) method with a new defined population initialization algorithm, a new fitness function definition, and improved mutation and recombination operators. . Support Vector Machine (SVM) with a linear kernel serves as a classifier of the GSP.

Results: Experimental results on ten microarray cancer datasets demonstrate that Gene Selection Programming (GSP) is effective and efficient in eliminating irrelevant and redundant genes/features from microarray datasets. The comprehensive evaluations and comparisons with other methods show that GSP gives a better compromise in terms of all three evaluation criteria, i.e., classification accuracy, number of selected genes, and computational cost. The gene set selected by GSP has shown its superior performances in cancer classification compared to those selected by the up-to-date representative gene selection methods.

Conclusion: Gene subset selected by GSP can achieve a higher classification accuracy with less processing time.

Citing Articles

Prediction of in-hospital mortality risk for patients with acute ST-elevation myocardial infarction after primary PCI based on predictors selected by GRACE score and two feature selection methods.

Tang N, Liu S, Li K, Zhou Q, Dai Y, Sun H Front Cardiovasc Med. 2024; 11:1419551.

PMID: 39502196 PMC: 11534735. DOI: 10.3389/fcvm.2024.1419551.


Machine learning for pan-cancer classification based on RNA sequencing data.

Stancl P, Karlic R Front Mol Biosci. 2023; 10:1285795.

PMID: 38028533 PMC: 10667476. DOI: 10.3389/fmolb.2023.1285795.


FSF-GA: A Feature Selection Framework for Phenotype Prediction Using Genetic Algorithms.

Mowlaei M, Shi X Genes (Basel). 2023; 14(5).

PMID: 37239419 PMC: 10218676. DOI: 10.3390/genes14051059.


Identification of Potential Biomarkers for Group I Pulmonary Hypertension Based on Machine Learning and Bioinformatics Analysis.

Hu H, Cai J, Qi D, Li B, Yu L, Wang C Int J Mol Sci. 2023; 24(9).

PMID: 37175757 PMC: 10178909. DOI: 10.3390/ijms24098050.


A Novel Hybrid Runge Kutta Optimizer with Support Vector Machine on Gene Expression Data for Cancer Classification.

Houssein E, Hassan H, Samee N, Jamjoom M Diagnostics (Basel). 2023; 13(9).

PMID: 37175012 PMC: 10178557. DOI: 10.3390/diagnostics13091621.


References
1.
Azzawi H, Hou J, Xiang Y, Alanni R . Lung cancer prediction from microarray data by gene expression programming. IET Syst Biol. 2016; 10(5):168-178. PMC: 8687242. DOI: 10.1049/iet-syb.2015.0082. View

2.
Singh D, Febbo P, Ross K, Jackson D, Manola J, Ladd C . Gene expression correlates of clinical prostate cancer behavior. Cancer Cell. 2002; 1(2):203-9. DOI: 10.1016/s1535-6108(02)00030-2. View

3.
Mohamad M, Omatu S, Deris S, Yoshioka M, Abdullah A, Ibrahim Z . An enhancement of binary particle swarm optimization for gene selection in classifying cancer classes. Algorithms Mol Biol. 2013; 8(1):15. PMC: 3847130. DOI: 10.1186/1748-7188-8-15. View

4.
Chuang L, Chang H, Tu C, Yang C . Improved binary PSO for feature selection using gene expression data. Comput Biol Chem. 2007; 32(1):29-37. DOI: 10.1016/j.compbiolchem.2007.09.005. View

5.
Al-Anni R, Hou J, Abdu-Aljabar R, Xiang Y . Prediction of NSCLC recurrence from microarray data with GEP. IET Syst Biol. 2017; 11(3):77-85. PMC: 8687152. DOI: 10.1049/iet-syb.2016.0033. View