Minimum Cross-entropy Pattern Classification and Cluster Analysis
Overview
Affiliations
This paper considers the problem of classifying an input vector of measurements by a nearest neighbor rule applied to a fixed set of vectors. The fixed vectors are sometimes called characteristic feature vectors, codewords, cluster centers, models, reproductions, etc. The nearest neighbor rule considered uses a non-Euclidean information-theoretic distortion measure that is not a metric, but that nevertheless leads to a classification method that is optimal in a well-defined sense and is also computationally attractive. Furthermore, the distortion measure results in a simple method of computing cluster centroids. Our approach is based on the minimization of cross-entropy (also called discrimination information, directed divergence, K-L number), and can be viewed as a refinement of a general classification method due to Kullback. The refinement exploits special properties of cross-entropy that hold when the probability densities involved happen to be minimum cross-entropy densities. The approach is a generalization of a recently developed speech coding technique called speech coding by vector quantization.
What Loss Functions Do Humans Optimize When They Perform Regression and Classification.
Ryu H, Srinivasan M bioRxiv. 2023; .
PMID: 37786669 PMC: 10541595. DOI: 10.1101/2023.09.19.558376.
Estimating Sentence-like Structure in Synthetic Languages Using Information Topology.
Back A, Wiles J Entropy (Basel). 2022; 24(7).
PMID: 35885083 PMC: 9317616. DOI: 10.3390/e24070859.
An Information Theoretic Approach to Symbolic Learning in Synthetic Languages.
Back A, Wiles J Entropy (Basel). 2022; 24(2).
PMID: 35205553 PMC: 8871184. DOI: 10.3390/e24020259.
The Convex Information Bottleneck Lagrangian.
Rodriguez Galvez B, Thobaben R, Skoglund M Entropy (Basel). 2020; 22(1).
PMID: 33285873 PMC: 7516537. DOI: 10.3390/e22010098.