Some Statistical Considerations In Clustering With Binary Data
Overview
Public Health
Social Sciences
Authors
Affiliations
A statistical theory of cluster homogeneity is developed for object:, scored on binary (0,l) variables. The theory utilizes two test statistics originally suggested by Tryon and Bailey (1970). The exact sampling distribution of the statistic H2,,, "squared homogeneity for cluster g on variable r" is derived under the assumption of a random assortment of 0's and 1's in the observed clusters. Formulas for the mean and variance of H,2, "the overall homogeneity for cluster g across all variables" are derived which may be used in con- junction with probability inequalities to carry out significance tests on this statistic. Comments concerning a framework for deriving metric distances between objects scored only on binary variables are also included.
A comparison of latent class, K-means, and K-median methods for clustering dichotomous data.
Brusco M, Shireman E, Steinley D Psychol Methods. 2016; 22(3):563-580.
PMID: 27607543 PMC: 5982597. DOI: 10.1037/met0000095.