» Articles » PMID: 37335961

BatMan: Mitigating Batch Effects Via Stratification for Survival Outcome Prediction

Overview
Date 2023 Jun 19
PMID 37335961
Authors
Affiliations
Soon will be listed here.
Abstract

Reproducible translation of transcriptomics data has been hampered by the ubiquitous presence of batch effects. Statistical methods for managing batch effects were initially developed in the setting of sample group comparison and later borrowed for other settings such as survival outcome prediction. The most notable such method is ComBat, which adjusts for batches by including it as a covariate alongside sample groups in a linear regression. In survival prediction, however, ComBat is used without definable groups for survival outcome and is done sequentially with survival regression for a potentially batch-confounded outcome. To address these issues, we propose a new method called BATch MitigAtion via stratificatioN (BatMan). It adjusts batches as strata in survival regression and uses variable selection methods such as the regularized regression to handle high dimensionality. We assess the performance of BatMan in comparison with ComBat, each used either alone or in conjunction with data normalization, in a resampling-based simulation study under various levels of predictive signal strength and patterns of batch-outcome association. Our simulations show that (1) BatMan outperforms ComBat in nearly all scenarios when there are batch effects in the data and (2) their performance can be worsened by the addition of data normalization. We further evaluate them using microRNA data for ovarian cancer from the Cancer Genome Atlas and find that BatMan outforms ComBat while the addition of data normalization worsens the prediction. Our study thus shows the advantage of BatMan and raises caution about the use of data normalization in the context of developing survival prediction models. The BatMan method and the simulation tool for performance assessment are implemented in R and publicly available at LXQin/PRECISION.survival-GitHub.

References
1.
Bradburn M, Clark T, Love S, Altman D . Survival analysis part II: multivariate data analysis--an introduction to concepts and methods. Br J Cancer. 2003; 89(3):431-6. PMC: 2394368. DOI: 10.1038/sj.bjc.6601119. View

2.
Johnson W, Li C, Rabinovic A . Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2006; 8(1):118-27. DOI: 10.1093/biostatistics/kxj037. View

3.
Vant Veer L, Bernards R . Enabling personalized cancer medicine through analysis of gene-expression patterns. Nature. 2008; 452(7187):564-70. DOI: 10.1038/nature06915. View

4.
Lee A . Prediction of cancer outcome with microarrays. Lancet. 2005; 365(9472):1685. DOI: 10.1016/S0140-6736(05)66541-5. View

5.
Harrell Jr F, Lee K, Mark D . Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996; 15(4):361-87. DOI: 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4. View