» Articles » PMID: 31948481

A Benchmark of Batch-effect Correction Methods for Single-cell RNA Sequencing Data

Overview
Journal Genome Biol
Specialties Biology
Genetics
Date 2020 Jan 18
PMID 31948481
Citations 452
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Large-scale single-cell transcriptomic datasets generated using different technologies contain batch-specific systematic variations that present a challenge to batch-effect removal and data integration. With continued growth expected in scRNA-seq data, achieving effective batch integration with available computational resources is crucial. Here, we perform an in-depth benchmark study on available batch correction methods to determine the most suitable method for batch-effect removal.

Results: We compare 14 methods in terms of computational runtime, the ability to handle large datasets, and batch-effect correction efficacy while preserving cell type purity. Five scenarios are designed for the study: identical cell types with different technologies, non-identical cell types, multiple batches, big data, and simulated data. Performance is evaluated using four benchmarking metrics including kBET, LISI, ASW, and ARI. We also investigate the use of batch-corrected data to study differential gene expression.

Conclusion: Based on our results, Harmony, LIGER, and Seurat 3 are the recommended methods for batch integration. Due to its significantly shorter runtime, Harmony is recommended as the first method to try, with the other methods as viable alternatives.

Citing Articles

A single-nucleus and spatial transcriptomic atlas of the COVID-19 liver reveals topological, functional, and regenerative organ disruption in patients.

Pita-Juarez Y, Karagkouni D, Kalavros N, Melms J, Niezen S, Delorey T Genome Biol. 2025; 26(1):56.

PMID: 40087773 DOI: 10.1186/s13059-025-03499-5.


Feature selection methods affect the performance of scRNA-seq data integration and querying.

Zappia L, Richter S, Ramirez-Suastegui C, Kfuri-Rubens R, Vornholz L, Wang W Nat Methods. 2025; .

PMID: 40082610 DOI: 10.1038/s41592-025-02624-3.


A dataset of single-cell transcriptomic atlas of Bama pig and potential marker genes across seven tissues.

Chen L, Tong X, Wu Y, Liu C, Tang C, Qi X BMC Genom Data. 2025; 26(1):16.

PMID: 40075302 PMC: 11899051. DOI: 10.1186/s12863-025-01308-3.


Composite quantile regression approach to batch effect correction in microbiome data.

Park J, Park T Front Microbiol. 2025; 16:1484183.

PMID: 40071205 PMC: 11893821. DOI: 10.3389/fmicb.2025.1484183.


WCSGNet: a graph neural network approach using weighted cell-specific networks for cell-type annotation in scRNA-seq.

Wang Y, Du P Front Genet. 2025; 16:1553352.

PMID: 40034748 PMC: 11872911. DOI: 10.3389/fgene.2025.1553352.


References
1.
Zappia L, Phipson B, Oshlack A . Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 2017; 18(1):174. PMC: 5596896. DOI: 10.1186/s13059-017-1305-0. View

2.
Buttner M, Miao Z, Wolf F, Teichmann S, Theis F . A test metric for assessing single-cell RNA-seq batch correction. Nat Methods. 2018; 16(1):43-49. DOI: 10.1038/s41592-018-0254-1. View

3.
Saunders A, Macosko E, Wysoker A, Goldman M, Krienen F, de Rivera H . Molecular Diversity and Specializations among the Cells of the Adult Mouse Brain. Cell. 2018; 174(4):1015-1030.e16. PMC: 6447408. DOI: 10.1016/j.cell.2018.07.028. View

4.
Rosenberg A, Roco C, Muscat R, Kuchina A, Sample P, Yao Z . Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science. 2018; 360(6385):176-182. PMC: 7643870. DOI: 10.1126/science.aam8999. View

5.
Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K . Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods. 2019; 16(12):1289-1296. PMC: 6884693. DOI: 10.1038/s41592-019-0619-0. View