» Articles » PMID: 20042109

Classification Across Gene Expression Microarray Studies

Overview
Publisher Biomed Central
Specialty Biology
Date 2010 Jan 1
PMID 20042109
Citations 6
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The increasing number of gene expression microarray studies represents an important resource in biomedical research. As a result, gene expression based diagnosis has entered clinical practice for patient stratification in breast cancer. However, the integration and combined analysis of microarray studies remains still a challenge. We assessed the potential benefit of data integration on the classification accuracy and systematically evaluated the generalization performance of selected methods on four breast cancer studies comprising almost 1000 independent samples. To this end, we introduced an evaluation framework which aims to establish good statistical practice and a graphical way to monitor differences. The classification goal was to correctly predict estrogen receptor status (negative/positive) and histological grade (low/high) of each tumor sample in an independent study which was not used for the training. For the classification we chose support vector machines (SVM), predictive analysis of microarrays (PAM), random forest (RF) and k-top scoring pairs (kTSP). Guided by considerations relevant for classification across studies we developed a generalization of kTSP which we evaluated in addition. Our derived version (DV) aims to improve the robustness of the intrinsic invariance of kTSP with respect to technologies and preprocessing.

Results: For each individual study the generalization error was benchmarked via complete cross-validation and was found to be similar for all classification methods. The misclassification rates were substantially higher in classification across studies, when each single study was used as an independent test set while all remaining studies were combined for the training of the classifier. However, with increasing number of independent microarray studies used in the training, the overall classification performance improved. DV performed better than the average and showed slightly less variance. In particular, the better predictive results of DV in across platform classification indicate higher robustness of the classifier when trained on single channel data and applied to gene expression ratios.

Conclusions: We present a systematic evaluation of strategies for the integration of independent microarray studies in a classification task. Our findings in across studies classification may guide further research aiming on the construction of more robust and reliable methods for stratification and diagnosis in clinical practice.

Citing Articles

Bayesian multi-source regression and monocyte-associated gene expression predict BCL-2 inhibitor resistance in acute myeloid leukemia.

White B, Khan S, Mason M, Ammad-Ud-Din M, Potdar S, Malani D NPJ Precis Oncol. 2021; 5(1):71.

PMID: 34302041 PMC: 8302655. DOI: 10.1038/s41698-021-00209-9.


A Comparison of Logistic Regression, Logic Regression, Classification Tree, and Random Forests to Identify Effective Gene-Gene and Gene-Environmental Interactions.

Yoo W, Ference B, Cote M, Schwartz A Int J Appl Sci Technol. 2013; 2(7):268.

PMID: 23795347 PMC: 3686280.


Configurable pattern-based evolutionary biclustering of gene expression data.

Pontes B, Giraldez R, Aguilar-Ruiz J Algorithms Mol Biol. 2013; 8(1):4.

PMID: 23433178 PMC: 3668234. DOI: 10.1186/1748-7188-8-4.


Multiple-platform data integration method with application to combined analysis of microarray and proteomic data.

Wu S, Xu Y, Feng Z, Yang X, Wang X, Gao X BMC Bioinformatics. 2012; 13:320.

PMID: 23198695 PMC: 3770449. DOI: 10.1186/1471-2105-13-320.


Improving biomarker list stability by integration of biological knowledge in the learning process.

Sanavia T, Aiolli F, Da San Martino G, Bisognin A, Di Camillo B BMC Bioinformatics. 2012; 13 Suppl 4:S22.

PMID: 22536969 PMC: 3314566. DOI: 10.1186/1471-2105-13-S4-S22.


References
1.
Shimo A, Tanikawa C, Nishidate T, Lin M, Matsuda K, Park J . Involvement of kinesin family member 2C/mitotic centromere-associated kinesin overexpression in mammary carcinogenesis. Cancer Sci. 2007; 99(1):62-70. PMC: 11158784. DOI: 10.1111/j.1349-7006.2007.00635.x. View

2.
Chang H, Nuyten D, Sneddon J, Hastie T, Tibshirani R, Sorlie T . Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci U S A. 2005; 102(10):3738-43. PMC: 548329. DOI: 10.1073/pnas.0409462102. View

3.
J van t Veer L, Dai H, van de Vijver M, He Y, Hart A, Mao M . Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002; 415(6871):530-6. DOI: 10.1038/415530a. View

4.
Xu L, Geman D, Winslow R . Large-scale integration of cancer microarray data identifies a robust common cancer signature. BMC Bioinformatics. 2007; 8:275. PMC: 1950528. DOI: 10.1186/1471-2105-8-275. View

5.
Buness A, Kuner R, Ruschhaupt M, Poustka A, Sultmann H, Tresch A . Identification of aberrant chromosomal regions from gene expression microarray studies applied to human breast cancer. Bioinformatics. 2007; 23(17):2273-80. DOI: 10.1093/bioinformatics/btm340. View