» Articles » PMID: 35135612

Benchmarking Clustering Algorithms on Estimating the Number of Cell Types from Single-cell RNA-sequencing Data

Overview
Journal Genome Biol
Specialties Biology
Genetics
Date 2022 Feb 9
PMID 35135612
Authors
Affiliations
Soon will be listed here.
Abstract

Background: A key task in single-cell RNA-seq (scRNA-seq) data analysis is to accurately detect the number of cell types in the sample, which can be critical for downstream analyses such as cell type identification. Various scRNA-seq data clustering algorithms have been specifically designed to automatically estimate the number of cell types through optimising the number of clusters in a dataset. The lack of benchmark studies, however, complicates the choice of the methods.

Results: We systematically benchmark a range of popular clustering algorithms on estimating the number of cell types in a variety of settings by sampling from the Tabula Muris data to create scRNA-seq datasets with a varying number of cell types, varying number of cells in each cell type, and different cell type proportions. The large number of datasets enables us to assess the performance of the algorithms, covering four broad categories of approaches, from various aspects using a panel of criteria. We further cross-compared the performance on datasets with high cell numbers using Tabula Muris and Tabula Sapiens data.

Conclusions: We identify the strengths and weaknesses of each method on multiple criteria including the deviation of estimation from the true number of cell types, variability of estimation, clustering concordance of cells to their predefined cell types, and running time and peak memory usage. We then summarise these results into a multi-aspect recommendation to the users. The proposed stability-based approach for estimating the number of cell types is implemented in an R package and is freely available from ( https://github.com/PYangLab/scCCESS ).

Citing Articles

Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation.

Arbatsky M, Vasilyeva E, Sysoeva V, Semina E, Saveliev V, Rubina K Front Bioinform. 2025; 5:1519468.

PMID: 40013100 PMC: 11861183. DOI: 10.3389/fbinf.2025.1519468.


Principled PCA separates signal from noise in omics count data.

Stanley J, Stanley 3rd J, Yang J, Li R, Lindenbaum O, Kobak D bioRxiv. 2025; .

PMID: 39975320 PMC: 11838471. DOI: 10.1101/2025.02.03.636129.


SpatialKNifeY (SKNY): Extending from spatial domain to surrounding area to identify microenvironment features with single-cell spatial omics data.

Sakai S, Nomura R, Nagasawa S, Chi S, Suzuki A, Suzuki Y PLoS Comput Biol. 2025; 21(2):e1012854.

PMID: 39965034 PMC: 11849985. DOI: 10.1371/journal.pcbi.1012854.


CSI-GEP: A GPU-based unsupervised machine learning approach for recovering gene expression programs in atlas-scale single-cell RNA-seq data.

Liu X, Chapple R, Bennett D, Wright W, Sanjali A, Culp E Cell Genom. 2025; 5(1):100739.

PMID: 39788105 PMC: 11770216. DOI: 10.1016/j.xgen.2024.100739.


Unsupervised multi-scale clustering of single-cell transcriptomes to identify hierarchical structures of cell subtypes.

Song W, Ming C, Forst C, Zhang B Res Sq. 2025; .

PMID: 39764102 PMC: 11703337. DOI: 10.21203/rs.3.rs-5671748/v1.


References
1.
John C, Watson D, Barnes M, Pitzalis C, Lewis M . Spectrum: fast density-aware spectral clustering for single and multi-omic data. Bioinformatics. 2019; 36(4):1159-1166. PMC: 7703791. DOI: 10.1093/bioinformatics/btz704. View

2.
Peyvandipour A, Shafi A, Saberian N, Draghici S . Identification of cell types from single cell data using stable clustering. Sci Rep. 2020; 10(1):12349. PMC: 7378075. DOI: 10.1038/s41598-020-66848-3. View

3.
Cheng C, Easton J, Rosencrance C, Li Y, Ju B, Williams J . Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data. Nucleic Acids Res. 2019; 47(22):e143. PMC: 6902034. DOI: 10.1093/nar/gkz826. View

4.
Grun D, Lyubimova A, Kester L, Wiebrands K, Basak O, Sasaki N . Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature. 2015; 525(7568):251-5. DOI: 10.1038/nature14966. View

5.
Stoeckius M, Hafemeister C, Stephenson W, Houck-Loomis B, Chattopadhyay P, Swerdlow H . Simultaneous epitope and transcriptome measurement in single cells. Nat Methods. 2017; 14(9):865-868. PMC: 5669064. DOI: 10.1038/nmeth.4380. View