» Articles » PMID: 16646797

Classifying Gene Expression Profiles from Pairwise MRNA Comparisons

Overview
Date 2006 May 2
PMID 16646797
Citations 148
Authors
Affiliations
Soon will be listed here.
Abstract

We present a new approach to molecular classification based on mRNA comparisons. Our method, referred to as the top-scoring pair(s) (TSP) classifier, is motivated by current technical and practical limitations in using gene expression microarray data for class prediction, for example to detect disease, identify tumors or predict treatment response. Accurate statistical inference from such data is difficult due to the small number of observations, typically tens, relative to the large number of genes, typically thousands. Moreover, conventional methods from machine learning lead to decisions which are usually very difficult to interpret in simple or biologically meaningful terms. In contrast, the TSP classifier provides decision rules which i) involve very few genes and only relative expression values (e.g., comparing the mRNA counts within a single pair of genes); ii) are both accurate and transparent; and iii) provide specific hypotheses for follow-up studies. In particular, the TSP classifier achieves prediction rates with standard cancer data that are as high as those of previous studies which use considerably more genes and complex procedures. Finally, the TSP classifier is parameter-free, thus avoiding the type of over-fitting and inflated estimates of performance that result when all aspects of learning a predictor are not properly cross-validated.

Citing Articles

Robust Cluster Prediction Across Data Types Validates Association of Sex and Therapy Response in GBM.

Gibbs D, Cioffi G, Aguilar B, Waite K, Pan E, Mandel J Cancers (Basel). 2025; 17(3).

PMID: 39941811 PMC: 11815886. DOI: 10.3390/cancers17030445.


ITree: a user-driven tool for interactive decision-making with classification trees.

Sokolowski H, Czajkowski M, Czajkowska A, Jurczuk K, Kretowski M Bioinformatics. 2024; 40(5).

PMID: 38640482 PMC: 11091738. DOI: 10.1093/bioinformatics/btae273.


Excavation of gene markers associated with pancreatic ductal adenocarcinoma based on interrelationships of gene expression.

Zhang Z, Sun Z, Gao D, Hao Y, Lin H, Liu F IET Syst Biol. 2024; 18(6):261-270.

PMID: 38530028 PMC: 11665842. DOI: 10.1049/syb2.12090.


Ensemble methods of rank-based trees for single sample classification with gene expression profiles.

Lu M, Yin R, Chen X J Transl Med. 2024; 22(1):140.

PMID: 38321494 PMC: 10848444. DOI: 10.1186/s12967-024-04940-2.


Distinct mesenchymal cell states mediate prostate cancer progression.

Pakula H, Omar M, Carelli R, Pederzoli F, Fanelli G, Pannellini T Nat Commun. 2024; 15(1):363.

PMID: 38191471 PMC: 10774315. DOI: 10.1038/s41467-023-44210-1.


References
1.
Simon R, Radmacher M, Dobbin K, McShane L . Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst. 2003; 95(1):14-8. DOI: 10.1093/jnci/95.1.14. View

2.
Siedow J . Making sense of microarrays. Genome Biol. 2001; 2(2):REPORTS4003. PMC: 138900. DOI: 10.1186/gb-2001-2-2-reports4003. View

3.
Yeang C, Ramaswamy S, Tamayo P, Mukherjee S, Rifkin R, Angelo M . Molecular classification of multiple tumor types. Bioinformatics. 2001; 17 Suppl 1:S316-22. DOI: 10.1093/bioinformatics/17.suppl_1.s316. View

4.
Soukup M, Lee J . Developing optimal prediction models for cancer classification using gene expression data. J Bioinform Comput Biol. 2004; 1(4):681-94. DOI: 10.1142/s0219720004000351. View

5.
Boulesteix A, Tutz G, Strimmer K . A CART-based approach to discover emerging patterns in microarray data. Bioinformatics. 2003; 19(18):2465-72. DOI: 10.1093/bioinformatics/btg361. View