» Articles » PMID: 23961961

Calculating Sample Size Estimates for RNA Sequencing Data

Overview
Journal J Comput Biol
Date 2013 Aug 22
PMID 23961961
Citations 166
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Given the high technical reproducibility and orders of magnitude greater resolution than microarrays, next-generation sequencing of mRNA (RNA-Seq) is quickly becoming the de facto standard for measuring levels of gene expression in biological experiments. Two important questions must be taken into consideration when designing a particular experiment, namely, 1) how deep does one need to sequence? and, 2) how many biological replicates are necessary to observe a significant change in expression?

Results: Based on the gene expression distributions from 127 RNA-Seq experiments, we find evidence that 91% ± 4% of all annotated genes are sequenced at a frequency of 0.1 times per million bases mapped, regardless of sample source. Based on this observation, and combining this information with other parameters such as biological variation and technical variation that we empirically estimate from our large datasets, we developed a model to estimate the statistical power needed to identify differentially expressed genes from RNA-Seq experiments.

Conclusions: Our results provide a needed reference for ensuring RNA-Seq gene expression studies are conducted with the optimally sample size, power, and sequencing depth. We also make available both R code and an Excel worksheet for investigators to calculate for their own experiments.

Citing Articles

Worm Perturb-Seq: massively parallel whole-animal RNAi and RNA-seq.

Zhang H, Li X, Song D, Yukselen O, Nanda S, Kucukural A bioRxiv. 2025; .

PMID: 39975282 PMC: 11838469. DOI: 10.1101/2025.02.02.636107.


Interspecies differences in the transcriptome response of corals to acute heat stress.

Da-Anoy J, Posadas N, Conaco C PeerJ. 2024; 12:e18627.

PMID: 39677947 PMC: 11639872. DOI: 10.7717/peerj.18627.


Cross-species conserved miRNA as biomarker of radiation injury over a wide dose range using nonhuman primate model.

Chakraborty N, Dimitrov G, Kanan S, Lawrence A, Moyler C, Gautam A PLoS One. 2024; 19(11):e0311379.

PMID: 39570918 PMC: 11581275. DOI: 10.1371/journal.pone.0311379.


Metabolic differentiation of brushtail possum populations resistant and susceptible to plant toxins revealed via differential gene expression.

Carmelet-Rescan D, Morgan-Richards M, Trewick S J Comp Physiol B. 2024; 195(1):103-121.

PMID: 39495241 PMC: 11839783. DOI: 10.1007/s00360-024-01591-z.


Comparative transcriptome analysis of cucumber fruit tissues reveals novel regulatory genes in ascorbic acid biosynthesis.

Ren J, Fu S, Wang H, Wang W, Wang X, Zhang H PeerJ. 2024; 12:e18327.

PMID: 39469594 PMC: 11514761. DOI: 10.7717/peerj.18327.


References
1.
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley D . Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012; 7(3):562-78. PMC: 3334321. DOI: 10.1038/nprot.2012.016. View

2.
Oshlack A, Robinson M, Young M . From RNA-seq reads to differential expression results. Genome Biol. 2010; 11(12):220. PMC: 3046478. DOI: 10.1186/gb-2010-11-12-220. View

3.
Wang Y, Ghaffari N, Johnson C, Braga-Neto U, Wang H, Chen R . Evaluation of the coverage and depth of transcriptome by RNA-Seq in chickens. BMC Bioinformatics. 2011; 12 Suppl 10:S5. PMC: 3236848. DOI: 10.1186/1471-2105-12-S10-S5. View

4.
Oberg A, Bot B, Grill D, Poland G, Therneau T . Technical and biological variance structure in mRNA-Seq data: life in the real world. BMC Genomics. 2012; 13:304. PMC: 3505161. DOI: 10.1186/1471-2164-13-304. View

5.
Anders S, Huber W . Differential expression analysis for sequence count data. Genome Biol. 2010; 11(10):R106. PMC: 3218662. DOI: 10.1186/gb-2010-11-10-r106. View