» Articles » PMID: 11983868

Selection Bias in Gene Extraction on the Basis of Microarray Gene-expression Data

Overview
Specialty Science
Date 2002 May 2
PMID 11983868
Citations 335
Authors
Affiliations
Soon will be listed here.
Abstract

In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called .632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes.

Citing Articles

Bioinformatics analysis of shared biomarkers and immune pathways of preeclampsia and periodontitis.

Ruan F, Wang Y, Ying X, Liu Y, Xu J, Zhao H BMC Pregnancy Childbirth. 2025; 25(1):217.

PMID: 40016711 PMC: 11866586. DOI: 10.1186/s12884-025-07277-w.


Utilizing machine-learning techniques on MRI radiomics to identify primary tumors in brain metastases.

Yang W, Su X, Li S, Zhao K, Yue Q Front Neurol. 2025; 15():1474461.

PMID: 39835148 PMC: 11743164. DOI: 10.3389/fneur.2024.1474461.


Bioinformatics analysis of effective biomarkers and immune infiltration in type 2 diabetes with cognitive impairment and aging.

Wang Q, Yang Y Sci Rep. 2024; 14(1):23279.

PMID: 39375405 PMC: 11488262. DOI: 10.1038/s41598-024-74480-8.


Feature selection by replicate reproducibility and non-redundancy.

Capraz T, Huber W Bioinformatics. 2024; 40(9).

PMID: 39254597 PMC: 11410923. DOI: 10.1093/bioinformatics/btae548.


Identification of a serum proteomic biomarker panel using diagnosis specific ensemble learning and symptoms for early pancreatic cancer detection.

Ney A, Nene N, Sedlak E, Acedo P, Blyuss O, Whitwell H PLoS Comput Biol. 2024; 20(8):e1012408.

PMID: 39208354 PMC: 11389906. DOI: 10.1371/journal.pcbi.1012408.


References
1.
Getz G, Levine E, Domany E . Coupled two-way clustering analysis of gene microarray data. Proc Natl Acad Sci U S A. 2000; 97(22):12079-84. PMC: 17297. DOI: 10.1073/pnas.210134797. View

2.
Xiong M, Li W, Zhao J, Jin L, Boerwinkle E . Feature (gene) selection in gene expression-based tumor classification. Mol Genet Metab. 2001; 73(3):239-47. DOI: 10.1006/mgme.2001.3193. View

3.
West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R . Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci U S A. 2001; 98(20):11462-7. PMC: 58752. DOI: 10.1073/pnas.201162998. View

4.
Brown M, Grundy W, Lin D, Cristianini N, Sugnet C, Furey T . Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci U S A. 2000; 97(1):262-7. PMC: 26651. DOI: 10.1073/pnas.97.1.262. View

5.
Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J . Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999; 286(5439):531-7. DOI: 10.1126/science.286.5439.531. View