» Articles » PMID: 16191195

Optimized Between-group Classification: a New Jackknife-based Gene Selection Procedure for Genome-wide Expression Data

Overview
Publisher Biomed Central
Specialty Biology
Date 2005 Sep 30
PMID 16191195
Citations 2
Authors
Affiliations
Soon will be listed here.
Abstract

Background: A recent publication described a supervised classification method for microarray data: Between Group Analysis (BGA). This method which is based on performing multivariate ordination of groups proved to be very efficient for both classification of samples into pre-defined groups and disease class prediction of new unknown samples. Classification and prediction with BGA are classically performed using the whole set of genes and no variable selection is required. We hypothesize that an optimized selection of highly discriminating genes might improve the prediction power of BGA.

Results: We propose an optimized between-group classification (OBC) which uses a jackknife-based gene selection procedure. OBC emphasizes classification accuracy rather than feature selection. OBC is a backward optimization procedure that maximizes the percentage of between group inertia by removing the least influential genes one by one from the analysis. This selects a subset of highly discriminative genes which optimize disease class prediction. We apply OBC to four datasets and compared it to other classification methods.

Conclusion: OBC considerably improved the classification and predictive accuracy of BGA, when assessed using independent data sets and leave-one-out cross-validation.

Availability: The R code is freely available [see Additional file 1] as well as supplementary information [see Additional file 2].

Citing Articles

Stability of gene contributions and identification of outliers in multivariate analysis of microarray data.

Baty F, Jaeger D, Preiswerk F, Schumacher M, Brutsche M BMC Bioinformatics. 2008; 9:289.

PMID: 18570644 PMC: 2441634. DOI: 10.1186/1471-2105-9-289.


Expression profiling in granulomatous lung disease.

Chen E, Moller D Proc Am Thorac Soc. 2007; 4(1):101-7.

PMID: 17202298 PMC: 2647607. DOI: 10.1513/pats.200607-140JG.

References
1.
Brown M, Grundy W, Lin D, Cristianini N, Sugnet C, Furey T . Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci U S A. 2000; 97(1):262-7. PMC: 26651. DOI: 10.1073/pnas.97.1.262. View

2.
Culhane A, Thioulouse J, Perriere G, Higgins D . MADE4: an R package for multivariate analysis of gene expression data. Bioinformatics. 2005; 21(11):2789-90. DOI: 10.1093/bioinformatics/bti394. View

3.
Xiong M, Jin L, Li W, Boerwinkle E . Computational methods for gene expression-based tumor classification. Biotechniques. 2000; 29(6):1264-8, 1270. DOI: 10.2144/00296bc02. View

4.
Khan J, Wei J, Ringner M, Saal L, Ladanyi M, Westermann F . Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med. 2001; 7(6):673-9. PMC: 1282521. DOI: 10.1038/89044. View

5.
Fellenberg K, Hauser N, Brors B, Neutzner A, Hoheisel J, Vingron M . Correspondence analysis applied to microarray data. Proc Natl Acad Sci U S A. 2001; 98(19):10781-6. PMC: 58552. DOI: 10.1073/pnas.181597298. View