» Articles » PMID: 26821670

Some Statistical Considerations In Clustering With Binary Data

Overview
Date 2016 Jan 30
PMID 26821670
Citations 1
Authors
Affiliations
Soon will be listed here.
Abstract

A statistical theory of cluster homogeneity is developed for object:, scored on binary (0,l) variables. The theory utilizes two test statistics originally suggested by Tryon and Bailey (1970). The exact sampling distribution of the statistic H2,,, "squared homogeneity for cluster g on variable r" is derived under the assumption of a random assortment of 0's and 1's in the observed clusters. Formulas for the mean and variance of H,2, "the overall homogeneity for cluster g across all variables" are derived which may be used in con- junction with probability inequalities to carry out significance tests on this statistic. Comments concerning a framework for deriving metric distances between objects scored only on binary variables are also included.

Citing Articles

A comparison of latent class, K-means, and K-median methods for clustering dichotomous data.

Brusco M, Shireman E, Steinley D Psychol Methods. 2016; 22(3):563-580.

PMID: 27607543 PMC: 5982597. DOI: 10.1037/met0000095.