Covariance Adjustment for Batch Effect in Gene Expression Data
Overview
Authors
Affiliations
Batch bias has been found in many microarray gene expression studies that involve multiple batches of samples. A serious batch effect can alter not only the distribution of individual genes but also the inter-gene relationships. Even though some efforts have been made to remove such bias, there has been relatively less development on a multivariate approach, mainly because of the analytical difficulty due to the high-dimensional nature of gene expression data. We propose a multivariate batch adjustment method that effectively eliminates inter-gene batch effects. The proposed method utilizes high-dimensional sparse covariance estimation based on a factor model and a hard thresholding. Another important aspect of the proposed method is that if it is known that one of the batches is produced in a superior condition, the other batches can be adjusted so that they resemble the target batch. We study high-dimensional asymptotic properties of the proposed estimator and compare the performance of the proposed method with some popular existing methods with simulated data and gene expression data sets.
Guo B, Xie Z, He W, Islam S, Gottlieb A, Chen H medRxiv. 2024; .
PMID: 39677479 PMC: 11643246. DOI: 10.1101/2024.12.06.24318618.
Masoomy H, Askari B, Tajik S, Rizi A, Jafari G Sci Rep. 2021; 11(1):16414.
PMID: 34385492 PMC: 8361050. DOI: 10.1038/s41598-021-94847-5.
Stability of Imbalanced Triangles in Gene Regulatory Networks of Cancerous and Normal Cells.
Rizi A, Zamani M, Shirazi A, Jafari G, Kertesz J Front Physiol. 2021; 11:573732.
PMID: 33551827 PMC: 7854919. DOI: 10.3389/fphys.2020.573732.
Correcting nuisance variation using Wasserstein distance.
Tabak G, Fan M, Yang S, Hoyer S, Davis G PeerJ. 2020; 8:e8594.
PMID: 32161688 PMC: 7050548. DOI: 10.7717/peerj.8594.
A Novel Statistical Method to Diagnose, Quantify and Correct Batch Effects in Genomic Studies.
Nyamundanda G, Poudel P, Patil Y, Sadanandam A Sci Rep. 2017; 7(1):10849.
PMID: 28883548 PMC: 5589920. DOI: 10.1038/s41598-017-11110-6.