» Articles » PMID: 21810900

A Powerful and Flexible Approach to the Analysis of RNA Sequence Count Data

Overview
Journal Bioinformatics
Specialty Biology
Date 2011 Aug 4
PMID 21810900
Citations 62
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: A number of penalization and shrinkage approaches have been proposed for the analysis of microarray gene expression data. Similar techniques are now routinely applied to RNA sequence transcriptional count data, although the value of such shrinkage has not been conclusively established. If penalization is desired, the explicit modeling of mean-variance relationships provides a flexible testing regimen that 'borrows' information across genes, while easily incorporating design effects and additional covariates.

Results: We describe BBSeq, which incorporates two approaches: (i) a simple beta-binomial generalized linear model, which has not been extensively tested for RNA-Seq data and (ii) an extension of an expression mean-variance modeling approach to RNA-Seq data, involving modeling of the overdispersion as a function of the mean. Our approaches are flexible, allowing for general handling of discrete experimental factors and continuous covariates. We report comparisons with other alternate methods to handle RNA-Seq data. Although penalized methods have advantages for very small sample sizes, the beta-binomial generalized linear model, combined with simple outlier detection and testing approaches, appears to have favorable characteristics in power and flexibility.

Availability: An R package containing examples and sample datasets is available at http://www.bios.unc.edu/research/genomic_software/BBSeq

Contact: yzhou@bios.unc.edu; fwright@bios.unc.edu

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Error modelled gene expression analysis (EMOGEA) provides a superior overview of time course RNA-seq measurements and low count gene expression.

Barra J, Taverna F, Bong F, Ahmed I, Karakach T Brief Bioinform. 2024; 25(3).

PMID: 38770716 PMC: 11106635. DOI: 10.1093/bib/bbae233.


Pairwise ratio-based differential abundance analysis of infant microbiome 16S sequencing data.

Mildau K, Te Beest D, Engel B, Gort G, Lambert J, Swinkels S NAR Genom Bioinform. 2023; 5(1):lqad001.

PMID: 36685726 PMC: 9853100. DOI: 10.1093/nargab/lqad001.


A Framework for Comparison and Assessment of Synthetic RNA-Seq Data.

Shakola F, Palejev D, Ivanov I Genes (Basel). 2022; 13(12).

PMID: 36553629 PMC: 9778097. DOI: 10.3390/genes13122362.


On taming the effect of transcript level intra-condition count variation during differential expression analysis: A story of dogs, foxes and wolves.

Lobo D, Linheiro R, Godinho R, Archer J PLoS One. 2022; 17(9):e0274591.

PMID: 36136981 PMC: 9498955. DOI: 10.1371/journal.pone.0274591.


NBBt-test: a versatile method for differential analysis of multiple types of RNA-seq data.

Tan Y, Guda C Sci Rep. 2022; 12(1):12833.

PMID: 35896555 PMC: 9329447. DOI: 10.1038/s41598-022-15762-x.


References
1.
Pickrell J, Marioni J, Pai A, Degner J, Engelhardt B, Nkadori E . Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010; 464(7289):768-72. PMC: 3089435. DOI: 10.1038/nature08872. View

2.
Montgomery S, Sammeth M, Gutierrez-Arcelus M, Lach R, Ingle C, Nisbett J . Transcriptome genetics using second generation sequencing in a Caucasian population. Nature. 2010; 464(7289):773-7. PMC: 3836232. DOI: 10.1038/nature08903. View

3.
Hu J, Wright F . Assessing differential gene expression with small sample sizes in oligonucleotide arrays using a mean-variance model. Biometrics. 2007; 63(1):41-9. DOI: 10.1111/j.1541-0420.2006.00675.x. View

4.
Robinson M, Oshlack A . A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010; 11(3):R25. PMC: 2864565. DOI: 10.1186/gb-2010-11-3-r25. View

5.
Robinson M, McCarthy D, Smyth G . edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2009; 26(1):139-40. PMC: 2796818. DOI: 10.1093/bioinformatics/btp616. View