» Articles » PMID: 17907809

Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis

Overview
Journal PLoS Genet
Specialty Genetics
Date 2007 Oct 3
PMID 17907809
Citations 1042
Authors
Affiliations
Soon will be listed here.
Abstract

It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measured variable(s) of interest, there will tend to be sources of signal due to factors that are unknown, unmeasured, or too complicated to capture through simple models. We show that failing to incorporate these sources of heterogeneity into an analysis can have widespread and detrimental effects on the study. Not only can this reduce power or induce unwanted dependence across genes, but it can also introduce sources of spurious signal to many genes. This phenomenon is true even for well-designed, randomized studies. We introduce "surrogate variable analysis" (SVA) to overcome the problems caused by heterogeneity in expression studies. SVA can be applied in conjunction with standard analysis techniques to accurately capture the relationship between expression and any modeled variables of interest. We apply SVA to disease class, time course, and genetics of gene expression studies. We show that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies.

Citing Articles

Composite quantile regression approach to batch effect correction in microbiome data.

Park J, Park T Front Microbiol. 2025; 16:1484183.

PMID: 40071205 PMC: 11893821. DOI: 10.3389/fmicb.2025.1484183.


Estropausal gut microbiota transplant improves measures of ovarian function in adult mice.

Kim M, Wang J, Pilley S, Lu R, Xu A, Kim Y bioRxiv. 2025; .

PMID: 40060387 PMC: 11888174. DOI: 10.1101/2024.05.03.592475.


Perturbations in the neuroactive ligand-receptor interaction and renin angiotensin system pathways are associated with cancer-related cognitive impairment.

Chan R, Walker A, Vardy J, Chan A, Oppegaard K, Conley Y Support Care Cancer. 2025; 33(4):254.

PMID: 40047999 PMC: 11885406. DOI: 10.1007/s00520-025-09317-9.


SpaFun: Discovering Domain-specific Spatial Expression Patterns and New Disease-Relevant Genes using Functional Principal Component Analysis.

Jiang X, Guo Y, Guo L, Zhong L, Wang J, Xiao G bioRxiv. 2025; .

PMID: 40027691 PMC: 11870527. DOI: 10.1101/2025.02.17.638766.


Deciphering the coordinated roles of the host genome, duodenal mucosal genes, and microbiota in regulating complex traits in chickens.

Lan F, Wang X, Zhou Q, Li X, Jin J, Zhang W Microbiome. 2025; 13(1):62.

PMID: 40025569 PMC: 11871680. DOI: 10.1186/s40168-025-02054-5.


References
1.
Dabney A, Storey J . A new approach to intensity-dependent normalization of two-channel microarrays. Biostatistics. 2006; 8(1):128-39. DOI: 10.1093/biostatistics/kxj038. View

2.
Brem R, Storey J, Whittle J, Kruglyak L . Genetic interactions between polymorphisms that affect gene expression in yeast. Nature. 2005; 436(7051):701-3. PMC: 1409747. DOI: 10.1038/nature03865. View

3.
Hedenfalk I, Ringner M, Ben-Dor A, Yakhini Z, Chen Y, Chebil G . Molecular classification of familial non-BRCA1/BRCA2 breast cancer. Proc Natl Acad Sci U S A. 2003; 100(5):2532-7. PMC: 151375. DOI: 10.1073/pnas.0533805100. View

4.
Brem R, Yvert G, Clinton R, Kruglyak L . Genetic dissection of transcriptional regulation in budding yeast. Science. 2002; 296(5568):752-5. DOI: 10.1126/science.1069516. View

5.
Hastie T, Tibshirani R . Generalized additive models for medical research. Stat Methods Med Res. 1995; 4(3):187-96. DOI: 10.1177/096228029500400302. View