» Articles » PMID: 11751221

Gene Selection for Sample Classification Based on Gene Expression Data: Study of Sensitivity to Choice of Parameters of the GA/KNN Method

Overview
Journal Bioinformatics
Specialty Biology
Date 2001 Dec 26
PMID 11751221
Citations 74
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: We recently introduced a multivariate approach that selects a subset of predictive genes jointly for sample classification based on expression data. We tested the algorithm on colon and leukemia data sets. As an extension to our earlier work, we systematically examine the sensitivity, reproducibility and stability of gene selection/sample classification to the choice of parameters of the algorithm.

Methods: Our approach combines a Genetic Algorithm (GA) and the k-Nearest Neighbor (KNN) method to identify genes that can jointly discriminate between different classes of samples (e.g. normal versus tumor). The GA/KNN method is a stochastic supervised pattern recognition method. The genes identified are subsequently used to classify independent test set samples.

Results: The GA/KNN method is capable of selecting a subset of predictive genes from a large noisy data set for sample classification. It is a multivariate approach that can capture the correlated structure in the data. We find that for a given data set gene selection is highly repeatable in independent runs using the GA/KNN method. In general, however, gene selection may be less robust than classification.

Availability: The method is available at http://dir.niehs.nih.gov/microarray/datamining

Contact: LI3@niehs.nih.gov

Citing Articles

A gene selection algorithm for microarray cancer classification using an improved particle swarm optimization.

Nagra A, Haider Khan A, Abubakar M, Faheem M, Rasool A, Masood K Sci Rep. 2024; 14(1):19613.

PMID: 39179674 PMC: 11343852. DOI: 10.1038/s41598-024-68744-6.


Elucidating prognosis in cervical squamous cell carcinoma and endocervical adenocarcinoma: a novel anoikis-related gene signature model.

Wang M, Ying Q, Ding R, Xing Y, Wang J, Pan Y Front Oncol. 2024; 14:1352638.

PMID: 38988712 PMC: 11234598. DOI: 10.3389/fonc.2024.1352638.


Classification and selection of the main features for the identification of toxicity in and with machine learning algorithms.

Ortiz-Letechipia J, Galvan-Tejada C, Galvan-Tejada J, Soto-Murillo M, Acosta-Cruz E, Gamboa-Rosales H PeerJ. 2024; 12:e16501.

PMID: 38223762 PMC: 10785791. DOI: 10.7717/peerj.16501.


Transcriptional profiles reveal histologic origin and prognosis across 33 The Cancer Genome Atlas tumor types.

Xiao H, Hu L, Tan Q, Jia J, Xie P, Li J Transl Cancer Res. 2023; 12(10):2764-2780.

PMID: 37969389 PMC: 10643977. DOI: 10.21037/tcr-23-234.


Development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction.

Andersson B, Langen B, Liu P, Lopez M Front Oncol. 2023; 13:1156009.

PMID: 37256187 PMC: 10225714. DOI: 10.3389/fonc.2023.1156009.