A Cross Entropy Test Allows Quantitative Statistical Comparison of T-SNE and UMAP Representations
Overview
Authors
Affiliations
The advent of high-dimensional single-cell data has necessitated the development of dimensionality-reduction tools. t-Distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are the two most frequently used approaches, allowing clear visualization of complex single-cell datasets. Despite the need for quantitative comparison, t-SNE and UMAP have largely remained visualization tools due to the lack of robust statistical approaches. Here, we have derived a statistical test for evaluating the difference between dimensionality-reduced datasets using the Kolmogorov-Smirnov test on the distributions of cross entropy of single cells within each dataset. As the approach uses the inter-relationship of single cells for comparison, the resulting statistic is robust and capable of identifying true biological variation. Further, the test provides a valid distance between single-cell datasets, allowing the organization of multiple samples into a dendrogram for quantitative comparison of complex datasets. These results demonstrate the largely untapped potential of dimensionality-reduction tools for biomedical data analysis beyond visualization.
Schilling K Cerebellum. 2025; 24(2):55.
PMID: 40019676 PMC: 11870911. DOI: 10.1007/s12311-025-01809-y.
Pentz H, Warford T, Timokhin I, Zhou H, Yang Q, Bhattacharya A Commun Phys. 2025; 8(1):25.
PMID: 39850966 PMC: 11756449. DOI: 10.1038/s42005-025-01936-2.
Application of machine learning for mass spectrometry-based multi-omics in thyroid diseases.
Che Y, Zhao M, Gao Y, Zhang Z, Zhang X Front Mol Biosci. 2025; 11:1483326.
PMID: 39741929 PMC: 11685090. DOI: 10.3389/fmolb.2024.1483326.
Using UMAP for Partially Synthetic Healthcare Tabular Data Generation and Validation.
Lazaro C, Angulo C Sensors (Basel). 2024; 24(23).
PMID: 39686380 PMC: 11645063. DOI: 10.3390/s24237843.
Boles J, Uriarte Huarte O, Tansey M Res Sq. 2024; .
PMID: 38883776 PMC: 11177977. DOI: 10.21203/rs.3.rs-4478250/v1.