» Articles » PMID: 34906205

Benchmarking UMI-based Single-cell RNA-seq Preprocessing Workflows

Overview
Journal Genome Biol
Specialties Biology
Genetics
Date 2021 Dec 15
PMID 34906205
Citations 18
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which assign sequencing reads to genes to create count matrices for downstream analysis. While several packaged preprocessing workflows have been developed to provide users with convenient tools for handling this process, how they compare to one another and how they influence downstream analysis have not been well studied.

Results: Here, we systematically benchmark the performance of 10 end-to-end preprocessing workflows (Cell Ranger, Optimus, salmon alevin, alevin-fry, kallisto bustools, dropSeqPipe, scPipe, zUMIs, celseq2, and scruff) using datasets yielding different biological complexity levels generated by CEL-Seq2 and 10x Chromium platforms. We compare these workflows in terms of their quantification properties directly and their impact on normalization and clustering by evaluating the performance of different method combinations. While the scRNA-seq preprocessing workflows compared vary in their detection and quantification of genes across datasets, after downstream analysis with performant normalization and clustering methods, almost all combinations produce clustering results that agree well with the known cell type labels that provided the ground truth in our analysis.

Conclusions: In summary, the choice of preprocessing method was found to be less important than other steps in the scRNA-seq analysis process. Our study comprehensively compares common scRNA-seq preprocessing workflows and summarizes their characteristics to guide workflow users.

Citing Articles

Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation.

Arbatsky M, Vasilyeva E, Sysoeva V, Semina E, Saveliev V, Rubina K Front Bioinform. 2025; 5:1519468.

PMID: 40013100 PMC: 11861183. DOI: 10.3389/fbinf.2025.1519468.


Integrated single-cell and bulk RNA sequencing reveals immune-related SPP1+ macrophages as a potential strategy for predicting the prognosis and treatment of liver fibrosis and hepatocellular carcinoma.

Li B, Hu J, Xu H Front Immunol. 2024; 15:1455383.

PMID: 39635536 PMC: 11615077. DOI: 10.3389/fimmu.2024.1455383.


Uncovering functional lncRNAs by scRNA-seq with ELATUS.

Goni E, Mas A, Gonzalez J, Abad A, Santisteban M, Fortes P Nat Commun. 2024; 15(1):9709.

PMID: 39521797 PMC: 11550465. DOI: 10.1038/s41467-024-54005-7.


Scywalker: scalable end-to-end data analysis workflow for long-read single-cell transcriptome sequencing.

De Rijk P, Watzeels T, Kucukali F, Van Dongen J, Faura J, Willems P Bioinformatics. 2024; 40(9).

PMID: 39254601 PMC: 11419950. DOI: 10.1093/bioinformatics/btae549.


Single-cell omics: experimental workflow, data analyses and applications.

Sun F, Li H, Sun D, Fu S, Gu L, Shao X Sci China Life Sci. 2024; 68(1):5-102.

PMID: 39060615 DOI: 10.1007/s11427-023-2561-0.


References
1.
Soneson C, Srivastava A, Patro R, Stadler M . Preprocessing choices affect RNA velocity results for droplet scRNA-seq data. PLoS Comput Biol. 2021; 17(1):e1008585. PMC: 7822509. DOI: 10.1371/journal.pcbi.1008585. View

2.
Robinson M, McCarthy D, Smyth G . edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2009; 26(1):139-40. PMC: 2796818. DOI: 10.1093/bioinformatics/btp616. View

3.
Lun A, Bach K, Marioni J . Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 2016; 17:75. PMC: 4848819. DOI: 10.1186/s13059-016-0947-7. View

4.
Freytag S, Tian L, Lonnstedt I, Ng M, Bahlo M . Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data. F1000Res. 2019; 7:1297. PMC: 6124389. DOI: 10.12688/f1000research.15809.2. View

5.
Melsted P, Booeshaghi A, Liu L, Gao F, Lu L, Min K . Modular, efficient and constant-memory single-cell RNA-seq preprocessing. Nat Biotechnol. 2021; 39(7):813-818. DOI: 10.1038/s41587-021-00870-2. View