K-means May Perform As Well As Mixture Model Clustering but May Also Be Much Worse: Comment on Steinley and Brusco (2011)
Overview
Authors
Affiliations
Steinley and Brusco (2011) presented the results of a huge simulation study aimed at evaluating cluster recovery of mixture model clustering (MMC) both for the situation where the number of clusters is known and is unknown. They derived rather strong conclusions on the basis of this study, especially with regard to the good performance of K-means (KM) compared with MMC. I agree with the authors' conclusion that the performance of KM may be equal to MMC in certain situations, which are primarily the situations investigated by Steinley and Brusco. However, a weakness of the paper is the failure to investigate many important real-world situations where theory suggests that MMC should outperform KM. This article elaborates on the KM-MMC comparison in terms of cluster recovery and provides some additional simulation results that show that KM may be much worse than MMC. Moreover, I show that KM is equivalent to a restricted mixture model estimated by maximizing the classification likelihood and comment on Steinley and Brusco's recommendation regarding the use of mixture models for clustering.
Liu T, Shryane N, Elliot M Humanit Soc Sci Commun. 2022; 9(1):279.
PMID: 35996468 PMC: 9386649. DOI: 10.1057/s41599-022-01287-1.
Ma E, Kim J, Lee Y, Cho S, Kim H, Kim J Sci Rep. 2021; 11(1):4457.
PMID: 33627761 PMC: 7904925. DOI: 10.1038/s41598-021-84003-4.
A heteroscedastic hidden Markov mixture model for responses and categorized response times.
Molenaar D, Rozsa S, Bolsinova M Behav Res Methods. 2019; 51(2):676-696.
PMID: 30924104 PMC: 6478648. DOI: 10.3758/s13428-019-01229-x.
Kashikar-Zuck S, Cunningham N, Peugh J, Black W, Nelson S, Lynch-Jordan A Pain. 2018; 160(2):433-441.
PMID: 30335681 PMC: 6344278. DOI: 10.1097/j.pain.0000000000001415.
Phu J, Khuu S, Bui B, Kalloniatis M Transl Vis Sci Technol. 2018; 7(5):3.
PMID: 30197835 PMC: 6126954. DOI: 10.1167/tvst.7.5.3.