» Articles » PMID: 19038021

Clustering Cancer Gene Expression Data: a Comparative Study

Overview
Publisher Biomed Central
Specialty Biology
Date 2008 Nov 29
PMID 19038021
Citations 86
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The use of clustering methods for the discovery of cancer subtypes has drawn a great deal of attention in the scientific community. While bioinformaticians have proposed new clustering methods that take advantage of characteristics of the gene expression data, the medical community has a preference for using "classic" clustering methods. There have been no studies thus far performing a large-scale evaluation of different clustering methods in this context.

Results/conclusion: We present the first large-scale analysis of seven different clustering methods and four proximity measures for the analysis of 35 cancer gene expression data sets. Our results reveal that the finite mixture of Gaussians, followed closely by k-means, exhibited the best performance in terms of recovering the true structure of the data sets. These methods also exhibited, on average, the smallest difference between the actual number of classes in the data sets and the best number of clusters as indicated by our validation criteria. Furthermore, hierarchical methods, which have been widely used by the medical community, exhibited a poorer recovery performance than that of the other methods evaluated. Moreover, as a stable basis for the assessment and comparison of different clustering methods for cancer gene expression data, this study provides a common group of data sets (benchmark data sets) to be shared among researchers and used for comparisons with new methods. The data sets analyzed in this study are available at http://algorithmics.molgen.mpg.de/Supplements/CompCancer/.

Citing Articles

Multi-way overlapping clustering by Bayesian tensor decomposition.

Wang Z, Zhou F, He K, Ni Y Stat Interface. 2024; 17(2):219-230.

PMID: 39713480 PMC: 11661849. DOI: 10.4310/23-sii790.


Evaluation of agreement between common clustering strategies for DNA methylation-based subtyping of breast tumours.

Zarean E, Li S, Wong E, Makalic E, Milne R, Giles G Epigenomics. 2024; 17(2):105-114.

PMID: 39711216 PMC: 11792870. DOI: 10.1080/17501911.2024.2441653.


Principles of artificial intelligence in radiooncology.

Huang Y, Gomaa A, Hofler D, Schubert P, Gaipl U, Frey B Strahlenther Onkol. 2024; 201(3):210-235.

PMID: 39105746 PMC: 11839771. DOI: 10.1007/s00066-024-02272-0.


Methods in DNA methylation array dataset analysis: A review.

Sahoo K, Sundararajan V Comput Struct Biotechnol J. 2024; 23:2304-2325.

PMID: 38845821 PMC: 11153885. DOI: 10.1016/j.csbj.2024.05.015.


Multi-Input data ASsembly for joint Analysis (MIASA): A framework for the joint analysis of disjoint sets of variables.

Raharinirina N, Sunkara V, von Kleist M, Fackeldey K, Weber M PLoS One. 2024; 19(5):e0302425.

PMID: 38728301 PMC: 11086896. DOI: 10.1371/journal.pone.0302425.


References
1.
Laiho P, Kokko A, Vanharanta S, Salovaara R, Sammalkorpi H, Jarvinen H . Serrated carcinomas form a subclass of colorectal cancer with distinct molecular basis. Oncogene. 2006; 26(2):312-20. DOI: 10.1038/sj.onc.1209778. View

2.
Ross M, Shurtleff S, Williams W, Patel D, Mahfouz R, Behm F . Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell. 2002; 1(2):133-43. DOI: 10.1016/s1535-6108(02)00032-6. View

3.
Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M . Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature. 2000; 406(6795):536-40. DOI: 10.1038/35020115. View

4.
Lapointe J, Li C, Higgins J, van de Rijn M, Bair E, Montgomery K . Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc Natl Acad Sci U S A. 2004; 101(3):811-6. PMC: 321763. DOI: 10.1073/pnas.0304146101. View

5.
West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R . Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci U S A. 2001; 98(20):11462-7. PMC: 58752. DOI: 10.1073/pnas.201162998. View