» Articles » PMID: 33627473

Linear-time Cluster Ensembles of Large-scale Single-cell RNA-seq and Multimodal Data

Overview
Journal Genome Res
Specialty Genetics
Date 2021 Feb 25
PMID 33627473
Citations 6
Authors
Affiliations
Soon will be listed here.
Abstract

A fundamental task in single-cell RNA-seq (scRNA-seq) analysis is the identification of transcriptionally distinct groups of cells. Numerous methods have been proposed for this problem, with a recent focus on methods for the cluster analysis of ultralarge scRNA-seq data sets produced by droplet-based sequencing technologies. Most existing methods rely on a sampling step to bridge the gap between algorithm scalability and volume of the data. Ignoring large parts of the data, however, often yields inaccurate groupings of cells and risks overlooking rare cell types. We propose method Specter that adopts and extends recent algorithmic advances in (fast) spectral clustering. In contrast to methods that cluster a (random) subsample of the data, we adopt the idea of landmarks that are used to create a sparse representation of the full data from which a spectral embedding can then be computed in linear time. We exploit Specter's speed in a cluster ensemble scheme that achieves a substantial improvement in accuracy over existing methods and identifies rare cell types with high sensitivity. Its linear-time complexity allows Specter to scale to millions of cells and leads to fast computation times in practice. Furthermore, on CITE-seq data that simultaneously measures gene and protein marker expression, we show that Specter is able to use multimodal omics measurements to resolve subtle transcriptomic differences between subpopulations of cells.

Citing Articles

GSTRPCA: irregular tensor singular value decomposition for single-cell multi-omics data clustering.

Cui L, Guo G, Ng M, Zou Q, Qiu Y Brief Bioinform. 2024; 26(1).

PMID: 39680741 PMC: 11647523. DOI: 10.1093/bib/bbae649.


scMNMF: a novel method for single-cell multi-omics clustering based on matrix factorization.

Qiu Y, Guo D, Zhao P, Zou Q Brief Bioinform. 2024; 25(3).

PMID: 38754408 PMC: 11097994. DOI: 10.1093/bib/bbae228.


Clustering of single-cell multi-omics data with a multimodal deep learning method.

Lin X, Tian T, Wei Z, Hakonarson H Nat Commun. 2022; 13(1):7705.

PMID: 36513636 PMC: 9748135. DOI: 10.1038/s41467-022-35031-9.


Secuer: Ultrafast, scalable and accurate clustering of single-cell RNA-seq data.

Wei N, Nie Y, Liu L, Zheng X, Wu H PLoS Comput Biol. 2022; 18(12):e1010753.

PMID: 36469543 PMC: 9754601. DOI: 10.1371/journal.pcbi.1010753.


Metacells untangle large and complex single-cell transcriptome networks.

Bilous M, Tran L, Cianciaruso C, Gabriel A, Michel H, Carmona S BMC Bioinformatics. 2022; 23(1):336.

PMID: 35963997 PMC: 9375201. DOI: 10.1186/s12859-022-04861-1.


References
1.
Sinha D, Kumar A, Kumar H, Bandyopadhyay S, Sengupta D . dropClust: efficient clustering of ultra-large scRNA-seq data. Nucleic Acids Res. 2018; 46(6):e36. PMC: 5888655. DOI: 10.1093/nar/gky007. View

2.
Stoeckius M, Hafemeister C, Stephenson W, Houck-Loomis B, Chattopadhyay P, Swerdlow H . Simultaneous epitope and transcriptome measurement in single cells. Nat Methods. 2017; 14(9):865-868. PMC: 5669064. DOI: 10.1038/nmeth.4380. View

3.
Kiselev V, Kirschner K, Schaub M, Andrews T, Yiu A, Chandra T . SC3: consensus clustering of single-cell RNA-seq data. Nat Methods. 2017; 14(5):483-486. PMC: 5410170. DOI: 10.1038/nmeth.4236. View

4.
Wang B, Mezlini A, Demir F, Fiume M, Tu Z, Brudno M . Similarity network fusion for aggregating data types on a genomic scale. Nat Methods. 2014; 11(3):333-7. DOI: 10.1038/nmeth.2810. View

5.
Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K . Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods. 2019; 16(12):1289-1296. PMC: 6884693. DOI: 10.1038/s41592-019-0619-0. View