» Articles » PMID: 36814837

A Cross Entropy Test Allows Quantitative Statistical Comparison of T-SNE and UMAP Representations

Overview
Specialty Cell Biology
Date 2023 Feb 23
PMID 36814837
Authors
Affiliations
Soon will be listed here.
Abstract

The advent of high-dimensional single-cell data has necessitated the development of dimensionality-reduction tools. t-Distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are the two most frequently used approaches, allowing clear visualization of complex single-cell datasets. Despite the need for quantitative comparison, t-SNE and UMAP have largely remained visualization tools due to the lack of robust statistical approaches. Here, we have derived a statistical test for evaluating the difference between dimensionality-reduced datasets using the Kolmogorov-Smirnov test on the distributions of cross entropy of single cells within each dataset. As the approach uses the inter-relationship of single cells for comparison, the resulting statistic is robust and capable of identifying true biological variation. Further, the test provides a valid distance between single-cell datasets, allowing the organization of multiple samples into a dendrogram for quantitative comparison of complex datasets. These results demonstrate the largely untapped potential of dimensionality-reduction tools for biomedical data analysis beyond visualization.

Citing Articles

A Gene-Expression Based Comparison of Murine and Human Inhibitory Interneurons in the Cerebellar Cortex and Nuclei.

Schilling K Cerebellum. 2025; 24(2):55.

PMID: 40019676 PMC: 11870911. DOI: 10.1007/s12311-025-01809-y.


Elf autoencoder for unsupervised exploration of flat-band materials using electronic band structure fingerprints.

Pentz H, Warford T, Timokhin I, Zhou H, Yang Q, Bhattacharya A Commun Phys. 2025; 8(1):25.

PMID: 39850966 PMC: 11756449. DOI: 10.1038/s42005-025-01936-2.


Application of machine learning for mass spectrometry-based multi-omics in thyroid diseases.

Che Y, Zhao M, Gao Y, Zhang Z, Zhang X Front Mol Biosci. 2025; 11:1483326.

PMID: 39741929 PMC: 11685090. DOI: 10.3389/fmolb.2024.1483326.


Using UMAP for Partially Synthetic Healthcare Tabular Data Generation and Validation.

Lazaro C, Angulo C Sensors (Basel). 2024; 24(23).

PMID: 39686380 PMC: 11645063. DOI: 10.3390/s24237843.


Peripheral endotoxin exposure in mice activates crosstalk between phagocytes in the brain and periphery.

Boles J, Uriarte Huarte O, Tansey M Res Sq. 2024; .

PMID: 38883776 PMC: 11177977. DOI: 10.21203/rs.3.rs-4478250/v1.


References
1.
Neumann J, Prezzemolo T, Vanderbeke L, Roca C, Gerbaux M, Janssens S . Increased IL-10-producing regulatory T cells are characteristic of severe cases of COVID-19. Clin Transl Immunology. 2020; 9(11):e1204. PMC: 7662088. DOI: 10.1002/cti2.1204. View

2.
Probst D, Reymond J . Visualization of very large high-dimensional data sets as minimum spanning trees. J Cheminform. 2021; 12(1):12. PMC: 7015965. DOI: 10.1186/s13321-020-0416-x. View

3.
Armand E, Li J, Xie F, Luo C, Mukamel E . Single-Cell Sequencing of Brain Cell Transcriptomes and Epigenomes. Neuron. 2021; 109(1):11-26. PMC: 7808568. DOI: 10.1016/j.neuron.2020.12.010. View

4.
Betters D . Use of Flow Cytometry in Clinical Practice. J Adv Pract Oncol. 2016; 6(5):435-40. PMC: 4803461. DOI: 10.6004/jadpro.2015.6.5.4. View

5.
Qin C, Zhou L, Hu Z, Zhang S, Yang S, Tao Y . Dysregulation of Immune Response in Patients With Coronavirus 2019 (COVID-19) in Wuhan, China. Clin Infect Dis. 2020; 71(15):762-768. PMC: 7108125. DOI: 10.1093/cid/ciaa248. View