» Articles » PMID: 35725563

HarmonizR Enables Data Harmonization Across Independent Proteomic Datasets with Appropriate Handling of Missing Values

Overview
Journal Nat Commun
Specialty Biology
Date 2022 Jun 20
PMID 35725563
Authors
Affiliations
Soon will be listed here.
Abstract

Dataset integration is common practice to overcome limitations in statistically underpowered omics datasets. Proteome datasets display high technical variability and frequent missing values. Sophisticated strategies for batch effect reduction are lacking or rely on error-prone data imputation. Here we introduce HarmonizR, a data harmonization tool with appropriate missing value handling. The method exploits the structure of available data and matrix dissection for minimal data loss, without data imputation. This strategy implements two common batch effect reduction methods-ComBat and limma (removeBatchEffect()). The HarmonizR strategy, evaluated on four exemplarily analyzed datasets with up to 23 batches, demonstrated successful data harmonization for different tissue preservation techniques, LC-MS/MS instrumentation setups, and quantification approaches. Compared to data imputation methods, HarmonizR was more efficient and performed superior regarding the detection of significant proteins. HarmonizR is an efficient tool for missing data tolerant experimental variance reduction and is easily adjustable for individual dataset properties and user preferences.

Citing Articles

HarmonizR: blocking and singular feature data adjustment improve runtime efficiency and data preservation.

Schlumbohm S, Neumann J, Neumann P BMC Bioinformatics. 2025; 26(1):47.

PMID: 39934730 PMC: 11817103. DOI: 10.1186/s12859-025-06073-9.


Discovery of a sushi domain-containing protein 2-positive phenotype in circulating tumor cells of metastatic breast cancer patients.

Bartkowiak K, Mohammadi P, Nissen P, Werner S, Agorku D, Andreas A Sci Rep. 2025; 15(1):3913.

PMID: 39890941 PMC: 11785953. DOI: 10.1038/s41598-025-87122-4.


Thinking points for effective batch correction on biomedical data.

Hui H, Kong W, Goh W Brief Bioinform. 2024; 25(6).

PMID: 39397427 PMC: 11471903. DOI: 10.1093/bib/bbae515.


Multiomic profiling of medulloblastoma reveals subtype-specific targetable alterations at the proteome and N-glycan level.

Godbole S, Voss H, Gocke A, Schlumbohm S, Schumann Y, Peng B Nat Commun. 2024; 15(1):6237.

PMID: 39043693 PMC: 11266559. DOI: 10.1038/s41467-024-50554-z.


Transcranial focused ultrasound to V5 enhances human visual motion brain-computer interface by modulating feature-based attention.

Kosnoff J, Yu K, Liu C, He B Nat Commun. 2024; 15(1):4382.

PMID: 38862476 PMC: 11167030. DOI: 10.1038/s41467-024-48576-8.


References
1.
cuklina J, Lee C, Williams E, Sajic T, Collins B, Rodriguez Martinez M . Diagnostics and correction of batch effects in large-scale proteomic studies: a tutorial. Mol Syst Biol. 2021; 17(8):e10240. PMC: 8447595. DOI: 10.15252/msb.202110240. View

2.
Zhuo L, Theis M, Alvarez-Maya I, Brenner M, Willecke K, Messing A . hGFAP-cre transgenic mice for manipulation of glial and neuronal function in vivo. Genesis. 2001; 31(2):85-94. DOI: 10.1002/gene.10008. View

3.
Petralia F, Tignor N, Reva B, Koptyra M, Chowdhury S, Rykunov D . Integrated Proteogenomic Characterization across Major Histological Types of Pediatric Brain Cancer. Cell. 2020; 183(7):1962-1985.e31. PMC: 8143193. DOI: 10.1016/j.cell.2020.10.044. View

4.
Hu A, Noble W, Wolf-Yadlin A . Technical advances in proteomics: new developments in data-independent acquisition. F1000Res. 2016; 5. PMC: 4821292. DOI: 10.12688/f1000research.7042.1. View

5.
Jakobsen J, Gluud C, Wetterslev J, Winkel P . When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts. BMC Med Res Methodol. 2017; 17(1):162. PMC: 5717805. DOI: 10.1186/s12874-017-0442-1. View