» Articles » PMID: 40027655

STEAM: Spatial Transcriptomics Evaluation Algorithm and Metric for Clustering Performance

Overview
Journal bioRxiv
Date 2025 Mar 3
PMID 40027655
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Spatial transcriptomic technologies allow researchers to explore the diversity and specificity of gene expression within their original tissue structure. Accurately identifying regions that are spatially coherent in both gene expression and physical tissue structures is an emerging topic, but challenging due to the lack of ground truth labels which renders complicating validation of clustering consistency and reproducibility. This highlights a need for a computational evaluation framework to rigorously and unbiasedly assess clustering performance.

Results: To address this gap, we propose STEAM (Spatial Transcriptomics Evaluation Algorithm and Metric), a user-friendly computational pipeline designed to evaluate the consistency and reliability of clustering results by leveraging machine learning classification and prediction methods, with the goal of maintaining the spatial proximity and gene expression patterns within clusters. We benchmarked STEAM on various public datasets, spanning multi-cell to single-cell resolution, as well as spatial transcriptomics and proteomics. The results highlighted its robustness and generalizability through comprehensive statistical evaluation metrics, such as Kappa score, F1 score, accuracy, and adjusted rand index. Notably, STEAM supports multi-sample training, enabling cross-replicate clustering consistency assessment. Moreover, STEAM provides practical guidance by comparing clustering results across multiple approaches; here, we evaluated four different methods, including spatial-aware and spatial-ignorant approaches. In summary, we believe that STEAM provides researchers a promising tool for evaluating clustering robustness and benchmarking clustering performance for spatial omics data, offering valuable insights to drive reproducible discoveries in spatial biology.

Availability And Implementation: Source code and the R software tool STEAM are available from https://github.com/fanzhanglab/STEAM.

References
1.
Moses L, Pachter L . Museum of spatial transcriptomics. Nat Methods. 2022; 19(5):534-546. DOI: 10.1038/s41592-022-01409-2. View

2.
Chen S, Zhu B, Huang S, Hickey J, Lin K, Snyder M . Integration of spatial and single-cell data across modalities with weakly linked features. Nat Biotechnol. 2023; 42(7):1096-1106. PMC: 11638971. DOI: 10.1038/s41587-023-01935-0. View

3.
Zhao E, Stone M, Ren X, Guenthoer J, Smythe K, Pulliam T . Spatial transcriptomics at subspot resolution with BayesSpace. Nat Biotechnol. 2021; 39(11):1375-1384. PMC: 8763026. DOI: 10.1038/s41587-021-00935-2. View

4.
Paradis E, Schliep K . ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2018; 35(3):526-528. DOI: 10.1093/bioinformatics/bty633. View

5.
Hafemeister C, Satija R . Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 2019; 20(1):296. PMC: 6927181. DOI: 10.1186/s13059-019-1874-1. View