» Articles » PMID: 30165506

A Cluster Robustness Score for Identifying Cell Subpopulations in Single Cell Gene Expression Datasets from Heterogeneous Tissues and Tumors

Overview
Journal Bioinformatics
Specialty Biology
Date 2018 Aug 31
PMID 30165506
Citations 6
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: A major aim of single cell biology is to identify important cell types such as stem cells in heterogeneous tissues and tumors. This is typically done by isolating hundreds of individual cells and measuring expression levels of multiple genes simultaneously from each cell. Then, clustering algorithms are used to group together similar single-cell expression profiles into clusters, each representing a distinct cell type. However, many of these clusters result from overfitting, meaning that rather than representing biologically meaningful cell types, they describe the intrinsic 'noise' in gene expression levels due to limitations in experimental precision or the intrinsic randomness of biochemical cellular processes. Consequentially, these non-meaningful clusters are most sensitive to noise: a slight shift in gene expression levels due to a repeated measurement will rearrange the grouping of data points such that these clusters break up.

Results: To identify the biologically meaningful clusters we propose a 'cluster robustness score': We add increasing amounts of noise (zero mean and increasing variance) and check which clusters are most robust in the sense that they do not mix with their neighbors up to high levels of noise. We show that biologically meaningful cell clusters that were manually identified in previously published single cell expression datasets have high robustness scores. These scores are higher than what would be expected in corresponding randomized homogeneous datasets having the same expression level statistics. We believe that this scoring system provides a more automated way to identify cell types in heterogeneous tissues and tumors.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

ESCHR: a hyperparameter-randomized ensemble approach for robust clustering across diverse datasets.

Goggin S, Zunder E Genome Biol. 2024; 25(1):242.

PMID: 39285487 PMC: 11406744. DOI: 10.1186/s13059-024-03386-5.


Unravelling cancer subtype-specific driver genes in single-cell transcriptomics data with CSDGI.

Huang M, Ma J, An G, Ye X PLoS Comput Biol. 2023; 19(12):e1011450.

PMID: 38096269 PMC: 10754467. DOI: 10.1371/journal.pcbi.1011450.


scGPS: Determining Cell States and Global Fate Potential of Subpopulations.

Thompson M, Matsumoto M, Ma T, Senabouth A, Palpant N, Powell J Front Genet. 2021; 12:666771.

PMID: 34349778 PMC: 8326972. DOI: 10.3389/fgene.2021.666771.


Selecting single cell clustering parameter values using subsampling-based robustness metrics.

Patterson-Cross R, Levine A, Menon V BMC Bioinformatics. 2021; 22(1):39.

PMID: 33522897 PMC: 7852188. DOI: 10.1186/s12859-021-03957-4.


Detecting Interactive Gene Groups for Single-Cell RNA-Seq Data Based on Co-Expression Network Analysis and Subgraph Learning.

Ye X, Zhang W, Futamura Y, Sakurai T Cells. 2020; 9(9).

PMID: 32825786 PMC: 7563496. DOI: 10.3390/cells9091938.