» Articles » PMID: 30587579

Semisoft Clustering of Single-cell Data

Overview
Specialty Science
Date 2018 Dec 28
PMID 30587579
Citations 27
Authors
Affiliations
Soon will be listed here.
Abstract

Motivated by the dynamics of development, in which cells of recognizable types, or pure cell types, transition into other types over time, we propose a method of semisoft clustering that can classify both pure and intermediate cell types from data on gene expression from individual cells. Called semisoft clustering with pure cells (SOUP), this algorithm reveals the clustering structure for both pure cells and transitional cells with soft memberships. SOUP involves a two-step process: Identify the set of pure cells and then estimate a membership matrix. To find pure cells, SOUP uses the special block structure in the expression similarity matrix. Once pure cells are identified, they provide the key information from which the membership matrix can be computed. By modeling cells as a continuous mixture of K discrete types we obtain more parsimonious results than obtained with standard clustering algorithms. Moreover, using soft membership estimates of cell type cluster centers leads to better estimates of developmental trajectories. The strong performance of SOUP is documented via simulation studies, which show its robustness to violations of modeling assumptions. The advantages of SOUP are illustrated by analyses of two independent datasets of gene expression from a large number of cells from fetal brain.

Citing Articles

ESCHR: a hyperparameter-randomized ensemble approach for robust clustering across diverse datasets.

Goggin S, Zunder E Genome Biol. 2024; 25(1):242.

PMID: 39285487 PMC: 11406744. DOI: 10.1186/s13059-024-03386-5.


scVIC: deep generative modeling of heterogeneity for scRNA-seq data.

Xiong J, Gong F, Ma L, Wan L Bioinform Adv. 2024; 4(1):vbae086.

PMID: 39027640 PMC: 11256938. DOI: 10.1093/bioadv/vbae086.


Topological and geometric analysis of cell states in single-cell transcriptomic data.

Huynh T, Cang Z Brief Bioinform. 2024; 25(3).

PMID: 38632952 PMC: 11024518. DOI: 10.1093/bib/bbae176.


Scalable nonparametric clustering with unified marker gene selection for single-cell RNA-seq data.

Nwizu C, Hughes M, Ramseier M, Navia A, Shalek A, Fusi N bioRxiv. 2024; .

PMID: 38405697 PMC: 10888887. DOI: 10.1101/2024.02.11.579839.


An introduction to representation learning for single-cell data analysis.

Gunawan I, Vafaee F, Meijering E, Lock J Cell Rep Methods. 2023; 3(8):100547.

PMID: 37671013 PMC: 10475795. DOI: 10.1016/j.crmeth.2023.100547.


References
1.
Zappia L, Phipson B, Oshlack A . Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 2017; 18(1):174. PMC: 5596896. DOI: 10.1186/s13059-017-1305-0. View

2.
Zeisel A, Munoz-Manchado A, Codeluppi S, Lonnerberg P, La Manno G, Jureus A . Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science. 2015; 347(6226):1138-42. DOI: 10.1126/science.aaa1934. View

3.
Silbereis J, Pochareddy S, Zhu Y, Li M, Sestan N . The Cellular and Molecular Landscapes of the Developing Human Central Nervous System. Neuron. 2016; 89(2):248-68. PMC: 4959909. DOI: 10.1016/j.neuron.2015.12.008. View

4.
Baron M, Veres A, Wolock S, Faust A, Gaujoux R, Vetere A . A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. Cell Syst. 2016; 3(4):346-360.e4. PMC: 5228327. DOI: 10.1016/j.cels.2016.08.011. View

5.
Zhong S, Zhang S, Fan X, Wu Q, Yan L, Dong J . A single-cell RNA-seq survey of the developmental landscape of the human prefrontal cortex. Nature. 2018; 555(7697):524-528. DOI: 10.1038/nature25980. View