» Articles » PMID: 32762710

Demystifying "drop-outs" in Single-cell UMI Data

Overview
Journal Genome Biol
Specialties Biology
Genetics
Date 2020 Aug 9
PMID 32762710
Citations 54
Authors
Affiliations
Soon will be listed here.
Abstract

Many existing pipelines for scRNA-seq data apply pre-processing steps such as normalization or imputation to account for excessive zeros or "drop-outs." Here, we extensively analyze diverse UMI data sets to show that clustering should be the foremost step of the workflow. We observe that most drop-outs disappear once cell-type heterogeneity is resolved, while imputing or normalizing heterogeneous data can introduce unwanted noise. We propose a novel framework HIPPO (Heterogeneity-Inspired Pre-Processing tOol) that leverages zero proportions to explain cellular heterogeneity and integrates feature selection with iterative clustering. HIPPO leads to downstream analysis with greater flexibility and interpretability compared to alternatives.

Citing Articles

Batch correcting single-cell spatial transcriptomics count data with Crescendo improves visualization and detection of spatial gene patterns.

Millard N, Chen J, Palshikar M, Pelka K, Spurrell M, Price C Genome Biol. 2025; 26(1):36.

PMID: 40001084 PMC: 11863647. DOI: 10.1186/s13059-025-03479-9.


PbImpute: Precise Zero Discrimination and Balanced Imputation in Single-Cell RNA Sequencing Data.

Zhang Y, Wang Y, Liu X, Feng X J Chem Inf Model. 2025; 65(5):2670-2684.

PMID: 39957720 PMC: 11898086. DOI: 10.1021/acs.jcim.4c02125.


OneSC: a computational platform for recapitulating cell state transitions.

Peng D, Cahan P Bioinformatics. 2024; 40(12).

PMID: 39570626 PMC: 11630913. DOI: 10.1093/bioinformatics/btae703.


Evolutionary innovations in germline biology of placental mammals identified by transcriptomics of first-wave spermatogenesis in opossum.

Marshall K, Stadtmauer D, Maziarz J, Wagner G, Lesch B Dev Cell. 2024; 60(4):646-664.e8.

PMID: 39536760 PMC: 11859772. DOI: 10.1016/j.devcel.2024.10.013.


Shrinkage estimation of gene interaction networks in single-cell RNA sequencing data.

Vo D, Thorne T BMC Bioinformatics. 2024; 25(1):339.

PMID: 39462345 PMC: 11515282. DOI: 10.1186/s12859-024-05946-9.


References
1.
Zhang F, Wei K, Slowikowski K, Fonseka C, Rao D, Kelly S . Defining inflammatory cell states in rheumatoid arthritis joint synovial tissues by integrating single-cell transcriptomics and mass cytometry. Nat Immunol. 2019; 20(7):928-942. PMC: 6602051. DOI: 10.1038/s41590-019-0378-1. View

2.
Townes F, Hicks S, Aryee M, Irizarry R . Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol. 2019; 20(1):295. PMC: 6927135. DOI: 10.1186/s13059-019-1861-6. View

3.
Butler A, Hoffman P, Smibert P, Papalexi E, Satija R . Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018; 36(5):411-420. PMC: 6700744. DOI: 10.1038/nbt.4096. View

4.
Robinson M, McCarthy D, Smyth G . edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2009; 26(1):139-40. PMC: 2796818. DOI: 10.1093/bioinformatics/btp616. View

5.
Huang M, Wang J, Torre E, Dueck H, Shaffer S, Bonasio R . SAVER: gene expression recovery for single-cell RNA sequencing. Nat Methods. 2018; 15(7):539-542. PMC: 6030502. DOI: 10.1038/s41592-018-0033-z. View