» Articles » PMID: 22003245

Normalization, Testing, and False Discovery Rate Estimation for RNA-sequencing Data

Overview
Journal Biostatistics
Specialty Public Health
Date 2011 Oct 18
PMID 22003245
Citations 159
Authors
Affiliations
Soon will be listed here.
Abstract

We discuss the identification of genes that are associated with an outcome in RNA sequencing and other sequence-based comparative genomic experiments. RNA-sequencing data take the form of counts, so models based on the Gaussian distribution are unsuitable. Moreover, normalization is challenging because different sequencing experiments may generate quite different total numbers of reads. To overcome these difficulties, we use a log-linear model with a new approach to normalization. We derive a novel procedure to estimate the false discovery rate (FDR). Our method can be applied to data with quantitative, two-class, or multiple-class outcomes, and the computation is fast even for large data sets. We study the accuracy of our approaches for significance calculation and FDR estimation, and we demonstrate that our method has potential advantages over existing methods that are based on a Poisson or negative binomial model. In summary, this work provides a pipeline for the significance analysis of sequencing data.

Citing Articles

Normalization and selecting non-differentially expressed genes improve machine learning modelling of cross-platform transcriptomic data.

Deng F, Feng C, Gao N, Zhang L ArXiv. 2025; .

PMID: 39975431 PMC: 11838701.


Analysis of gene expression of Babesia gibsoni cultured with diminazene aceturate using RNA sequencing.

Matsuda N, Ito M, Nukada Y, Toyoma M, Nagai K, Motegi T J Vet Med Sci. 2025; 87(2):181-188.

PMID: 39756884 PMC: 11830443. DOI: 10.1292/jvms.24-0395.


Genome-wide profiling of DNA repair proteins in single cells.

de Luca K, Rullens P, Karpinska M, de Vries S, Gacek-Matthews A, Pongor L Nat Commun. 2024; 15(1):9918.

PMID: 39572529 PMC: 11582664. DOI: 10.1038/s41467-024-54159-4.


Quantitative proteomics reveals extensive lysine ubiquitination and transcription factor stability states in Arabidopsis.

Song G, Montes C, Olatunji D, Malik S, Ji C, Clark N Plant Cell. 2024; 37(1).

PMID: 39570863 PMC: 11663597. DOI: 10.1093/plcell/koae310.


Global impacts of peroxisome and pexophagy dysfunction revealed through multi-omics analyses of lon2 and atg2 mutants.

Muhammad D, Clark N, Tharp N, Chatt E, Vierstra R, Bartel B Plant J. 2024; 120(6):2563-2583.

PMID: 39526456 PMC: 11658196. DOI: 10.1111/tpj.17129.


References
1.
Srivastava S, Chen L . A two-parameter generalized Poisson model to improve the analysis of RNA-seq data. Nucleic Acids Res. 2010; 38(17):e170. PMC: 2943596. DOI: 10.1093/nar/gkq670. View

2.
Robinson M, Smyth G . Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics. 2007; 9(2):321-32. DOI: 10.1093/biostatistics/kxm030. View

3.
Bloom J, Khan Z, Kruglyak L, Singh M, Caudy A . Measuring differential gene expression by short read sequencing: quantitative comparison to 2-channel gene expression microarrays. BMC Genomics. 2009; 10:221. PMC: 2686739. DOI: 10.1186/1471-2164-10-221. View

4.
Wilhelm B, Landry J . RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing. Methods. 2009; 48(3):249-57. DOI: 10.1016/j.ymeth.2009.03.016. View

5.
Li J, Jiang H, Wong W . Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biol. 2010; 11(5):R50. PMC: 2898062. DOI: 10.1186/gb-2010-11-5-r50. View