An Enhanced Deterministic K-Means Clustering Algorithm for Cancer Subtype Prediction from Gene Expression Data
Overview
General Medicine
Medical Informatics
Authors
Affiliations
Background: Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids.
Method: We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids.
Results: We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others.
Conclusion: There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data.
Takekita Y, Matsumoto Y, Masuda T, Yoshida K, Koshikawa Y, Kato M Neuropsychopharmacol Rep. 2024; 44(4):784-791.
PMID: 39428614 PMC: 11609747. DOI: 10.1002/npr2.12490.
Chai H, Deng W, Wei J, Guan T, He M, Liang Y Interdiscip Sci. 2024; 16(4):966-975.
PMID: 39230797 DOI: 10.1007/s12539-024-00641-y.
Tripathi K Health Care Sci. 2024; 3(2):88-100.
PMID: 38939618 PMC: 11080790. DOI: 10.1002/hcs2.90.
Machine learning approaches for biomolecular, biophysical, and biomaterials research.
Rickert C, Lieleg O Biophys Rev (Melville). 2024; 3(2):021306.
PMID: 38505413 PMC: 10914139. DOI: 10.1063/5.0082179.
Sharon S, Daher-Ghanem N, Zaid D, Gough M, Kravchenko-Balasha N Front Oral Health. 2023; 4:1180869.
PMID: 37496754 PMC: 10366623. DOI: 10.3389/froh.2023.1180869.