» Articles » PMID: 24753412

Robustly Detecting Differential Expression in RNA Sequencing Data Using Observation Weights

Overview
Specialty Biochemistry
Date 2014 Apr 23
PMID 24753412
Citations 222
Authors
Affiliations
Soon will be listed here.
Abstract

A popular approach for comparing gene expression levels between (replicated) conditions of RNA sequencing data relies on counting reads that map to features of interest. Within such count-based methods, many flexible and advanced statistical approaches now exist and offer the ability to adjust for covariates (e.g. batch effects). Often, these methods include some sort of 'sharing of information' across features to improve inferences in small samples. It is important to achieve an appropriate tradeoff between statistical power and protection against outliers. Here, we study the robustness of existing approaches for count-based differential expression analysis and propose a new strategy based on observation weights that can be used within existing frameworks. The results suggest that outliers can have a global effect on differential analyses. We demonstrate the effectiveness of our new approach with real data and simulated data that reflects properties of real datasets (e.g. dispersion-mean trend) and develop an extensible framework for comprehensive testing of current and future methods. In addition, we explore the origin of such outliers, in some cases highlighting additional biological or technical factors within the experiment. Further details can be downloaded from the project website: http://imlspenticton.uzh.ch/robinson_lab/edgeR_robust/.

Citing Articles

Spatial transcriptomic analysis identifies epithelium-macrophage crosstalk in endometriotic lesions.

Burns G, Fu Z, Vegter E, Madaj Z, Greaves E, Flores I iScience. 2025; 28(2):111790.

PMID: 39935459 PMC: 11810701. DOI: 10.1016/j.isci.2025.111790.


edgeR v4: powerful differential analysis of sequencing data with expanded functionality and improved support for small counts and larger datasets.

Chen Y, Chen L, Lun A, Baldoni P, Smyth G Nucleic Acids Res. 2025; 53(2).

PMID: 39844453 PMC: 11754124. DOI: 10.1093/nar/gkaf018.


Robust double machine learning model with application to omics data.

Wang X, Liu Y, Qin G, Yu Y BMC Bioinformatics. 2024; 25(1):355.

PMID: 39543508 PMC: 11566156. DOI: 10.1186/s12859-024-05975-4.


YAP1 and WWTR1 are required for murine pregnancy initiation.

Moldovan G, Massri N, Vegter E, Pauneto-Delgado I, Burns G, Joshi N Reproduction. 2024; 169(1).

PMID: 39503541 PMC: 11874952. DOI: 10.1530/REP-24-0355.


Histone deacetylase 9 promotes osteogenic trans-differentiation of vascular smooth muscle cells via ferroptosis in chronic kidney disease vascular calcification.

Xiong L, Xiao Q, Chen R, Huang L, Gao J, Wang L Ren Fail. 2024; 46(2):2422435.

PMID: 39500708 PMC: 11539403. DOI: 10.1080/0886022X.2024.2422435.


References
1.
Robinson M, Smyth G . Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics. 2007; 9(2):321-32. DOI: 10.1093/biostatistics/kxm030. View

2.
van de Wiel M, Leday G, Pardo L, Rue H, van der Vaart A, van Wieringen W . Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics. 2012; 14(1):113-28. DOI: 10.1093/biostatistics/kxs031. View

3.
Frazee A, Langmead B, Leek J . ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets. BMC Bioinformatics. 2011; 12:449. PMC: 3229291. DOI: 10.1186/1471-2105-12-449. View

4.
Leng N, Dawson J, Thomson J, Ruotti V, Rissman A, Smits B . EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics. 2013; 29(8):1035-43. PMC: 3624807. DOI: 10.1093/bioinformatics/btt087. View

5.
Soneson C, Delorenzi M . A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics. 2013; 14:91. PMC: 3608160. DOI: 10.1186/1471-2105-14-91. View