» Articles » PMID: 38057718

PyComBat, a Python Tool for Batch Effects Correction in High-throughput Molecular Data Using Empirical Bayes Methods

Overview
Publisher Biomed Central
Specialty Biology
Date 2023 Dec 6
PMID 38057718
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Variability in datasets is not only the product of biological processes: they are also the product of technical biases. ComBat and ComBat-Seq are among the most widely used tools for correcting those technical biases, called batch effects, in, respectively, microarray and RNA-Seq expression data.

Results: In this technical note, we present a new Python implementation of ComBat and ComBat-Seq. While the mathematical framework is strictly the same, we show here that our implementations: (i) have similar results in terms of batch effects correction; (ii) are as fast or faster than the original implementations in R and; (iii) offer new tools for the bioinformatics community to participate in its development. pyComBat is implemented in the Python language and is distributed under GPL-3.0 ( https://www.gnu.org/licenses/gpl-3.0.en.html ) license as a module of the inmoose package. Source code is available at https://github.com/epigenelabs/inmoose and Python package at https://pypi.org/project/inmoose .

Conclusions: We present a new Python implementation of state-of-the-art tools ComBat and ComBat-Seq for the correction of batch effects in microarray and RNA-Seq data. This new implementation, based on the same mathematical frameworks as ComBat and ComBat-Seq, offers similar power for batch effect correction, at reduced computational cost.

Citing Articles

AI Model for Predicting Anti-PD1 Response in Melanoma Using Multi-Omics Biomarkers.

Gschwind A, Ossowski S Cancers (Basel). 2025; 17(5).

PMID: 40075562 PMC: 11899402. DOI: 10.3390/cancers17050714.


Exploring NLRP3-related phenotypic fingerprints in human macrophages using Cell Painting assay.

Herring M, Sarndahl E, Kotlyar O, Scherbak N, Engwall M, Karlsson R iScience. 2025; 28(3):111961.

PMID: 40040812 PMC: 11876907. DOI: 10.1016/j.isci.2025.111961.


Applying machine learning to high-dimensional proteomics datasets for the identification of Alzheimer's disease biomarkers.

Ivarsson Orrelid C, Rosberg O, Weiner S, Johansson F, Gobom J, Zetterberg H Fluids Barriers CNS. 2025; 22(1):23.

PMID: 40033432 PMC: 11874791. DOI: 10.1186/s12987-025-00634-z.


Stress-responsive transcription factor families are key components of the core abiotic stress response in maize.

Pardo A, Pardo J, VanBuren R bioRxiv. 2025; .

PMID: 40027706 PMC: 11870519. DOI: 10.1101/2025.02.15.638452.


Improving musculoskeletal care with AI enhanced triage through data driven screening of referral letters.

Maarseveen T, Glas H, Veris-van Dieren J, van den Akker E, Knevel R NPJ Digit Med. 2025; 8(1):98.

PMID: 39948271 PMC: 11825706. DOI: 10.1038/s41746-025-01495-4.


References
1.
Lionetti M, Barbieri M, Todoerti K, Agnelli L, Fabris S, Tonon G . A compendium of DIS3 mutations and associated transcriptional signatures in plasma cell dyscrasias. Oncotarget. 2015; 6(28):26129-41. PMC: 4694891. DOI: 10.18632/oncotarget.4674. View

2.
Leek J, Johnson W, Parker H, Jaffe A, Storey J . The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012; 28(6):882-3. PMC: 3307112. DOI: 10.1093/bioinformatics/bts034. View

3.
Johnson W, Li C, Rabinovic A . Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2006; 8(1):118-27. DOI: 10.1093/biostatistics/kxj037. View

4.
Bonome T, Levine D, Shih J, Randonovich M, Pise-Masison C, Bogomolniy F . A gene signature predicting for survival in suboptimally debulked patients with ovarian cancer. Cancer Res. 2008; 68(13):5478-86. PMC: 7039050. DOI: 10.1158/0008-5472.CAN-07-6595. View

5.
C Mok S, Bonome T, Vathipadiekal V, Bell A, Johnson M, Wong K . A gene signature predictive for outcome in advanced ovarian cancer identifies a survival factor: microfibril-associated glycoprotein 2. Cancer Cell. 2009; 16(6):521-32. PMC: 3008560. DOI: 10.1016/j.ccr.2009.10.018. View