» Articles » PMID: 38486077

Label-aware Distance Mitigates Temporal and Spatial Variability for Clustering and Visualization of Single-cell Gene Expression Data

Overview
Journal Commun Biol
Specialty Biology
Date 2024 Mar 15
PMID 38486077
Authors
Affiliations
Soon will be listed here.
Abstract

Clustering and visualization are essential parts of single-cell gene expression data analysis. The Euclidean distance used in most distance-based methods is not optimal. The batch effect, i.e., the variability among samples gathered from different times, tissues, and patients, introduces large between-group distance and obscures the true identities of cells. To solve this problem, we introduce Label-Aware Distance (LAD), a metric using temporal/spatial locality of the batch effect to control for such factors. We validate LAD on simulated data as well as apply it to a mouse retina development dataset and a lung dataset. We also found the utility of our approach in understanding the progression of the Coronavirus Disease 2019 (COVID-19). LAD provides better cell embedding than state-of-the-art batch correction methods on longitudinal datasets. It can be used in distance-based clustering and visualization methods to combine the power of multiple samples to help make biological findings.

Citing Articles

Clustering and classification for dry bean feature imbalanced data.

Lee C, Wang W, Huang J Sci Rep. 2024; 14(1):31058.

PMID: 39730714 PMC: 11681048. DOI: 10.1038/s41598-024-82253-6.


Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data.

Liang S, Dou J, Iqbal R, Chen K Commun Biol. 2024; 7(1):326.

PMID: 38486077 PMC: 10940680. DOI: 10.1038/s42003-024-05988-y.

References
1.
Luecken M, Buttner M, Chaichoompu K, Danese A, Interlandi M, Mueller M . Benchmarking atlas-level data integration in single-cell genomics. Nat Methods. 2021; 19(1):41-50. PMC: 8748196. DOI: 10.1038/s41592-021-01336-8. View

2.
Regev A, Teichmann S, Lander E, Amit I, Benoist C, Birney E . The Human Cell Atlas. Elife. 2017; 6. PMC: 5762154. DOI: 10.7554/eLife.27041. View

3.
Muller C, Schillert A, Rothemeier C, Tregouet D, Proust C, Binder H . Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data. PLoS One. 2016; 11(6):e0156594. PMC: 4896498. DOI: 10.1371/journal.pone.0156594. View

4.
Liao M, Liu Y, Yuan J, Wen Y, Xu G, Zhao J . Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19. Nat Med. 2020; 26(6):842-844. DOI: 10.1038/s41591-020-0901-9. View

5.
Saelens W, Cannoodt R, Todorov H, Saeys Y . A comparison of single-cell trajectory inference methods. Nat Biotechnol. 2019; 37(5):547-554. DOI: 10.1038/s41587-019-0071-9. View