» Articles » PMID: 24687561

Covariance Adjustment for Batch Effect in Gene Expression Data

Overview
Journal Stat Med
Publisher Wiley
Specialty Public Health
Date 2014 Apr 2
PMID 24687561
Citations 8
Authors
Affiliations
Soon will be listed here.
Abstract

Batch bias has been found in many microarray gene expression studies that involve multiple batches of samples. A serious batch effect can alter not only the distribution of individual genes but also the inter-gene relationships. Even though some efforts have been made to remove such bias, there has been relatively less development on a multivariate approach, mainly because of the analytical difficulty due to the high-dimensional nature of gene expression data. We propose a multivariate batch adjustment method that effectively eliminates inter-gene batch effects. The proposed method utilizes high-dimensional sparse covariance estimation based on a factor model and a hard thresholding. Another important aspect of the proposed method is that if it is known that one of the batches is produced in a superior condition, the other batches can be adjusted so that they resemble the target batch. We study high-dimensional asymptotic properties of the proposed estimator and compare the performance of the proposed method with some popular existing methods with simulated data and gene expression data sets.

Citing Articles

Efficient multi-phenotype genome-wide analysis identifies genetic associations for unsupervised deep-learning-derived high-dimensional brain imaging phenotypes.

Guo B, Xie Z, He W, Islam S, Gottlieb A, Chen H medRxiv. 2024; .

PMID: 39677479 PMC: 11643246. DOI: 10.1101/2024.12.06.24318618.


Topological analysis of interaction patterns in cancer-specific gene regulatory network: persistent homology approach.

Masoomy H, Askari B, Tajik S, Rizi A, Jafari G Sci Rep. 2021; 11(1):16414.

PMID: 34385492 PMC: 8361050. DOI: 10.1038/s41598-021-94847-5.


Stability of Imbalanced Triangles in Gene Regulatory Networks of Cancerous and Normal Cells.

Rizi A, Zamani M, Shirazi A, Jafari G, Kertesz J Front Physiol. 2021; 11:573732.

PMID: 33551827 PMC: 7854919. DOI: 10.3389/fphys.2020.573732.


Correcting nuisance variation using Wasserstein distance.

Tabak G, Fan M, Yang S, Hoyer S, Davis G PeerJ. 2020; 8:e8594.

PMID: 32161688 PMC: 7050548. DOI: 10.7717/peerj.8594.


A Novel Statistical Method to Diagnose, Quantify and Correct Batch Effects in Genomic Studies.

Nyamundanda G, Poudel P, Patil Y, Sadanandam A Sci Rep. 2017; 7(1):10849.

PMID: 28883548 PMC: 5589920. DOI: 10.1038/s41598-017-11110-6.


References
1.
Hastie T, Tibshirani R . Efficient quadratic regularization for expression arrays. Biostatistics. 2004; 5(3):329-40. DOI: 10.1093/biostatistics/5.3.329. View

2.
Johnson W, Li C, Rabinovic A . Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2006; 8(1):118-27. DOI: 10.1093/biostatistics/kxj037. View

3.
Shedden K, Taylor J, Enkemann S, Tsao M, Yeatman T, Gerald W . Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med. 2008; 14(8):822-7. PMC: 2667337. DOI: 10.1038/nm.1790. View

4.
Qiao X, Zhang H, Liu Y, Todd M, Marron J . Weighted Distance Weighted Discrimination and Its Asymptotic Properties. J Am Stat Assoc. 2010; 105(489):401-414. PMC: 2996856. DOI: 10.1198/jasa.2010.tm08487. View

5.
Benito M, Parker J, Du Q, Wu J, Xiang D, Perou C . Adjustment of systematic microarray data biases. Bioinformatics. 2003; 20(1):105-14. DOI: 10.1093/bioinformatics/btg385. View