» Articles » PMID: 26202970

How Data Analysis Affects Power, Reproducibility and Biological Insight of RNA-seq Studies in Complex Datasets

Overview
Specialty Biochemistry
Date 2015 Jul 24
PMID 26202970
Citations 57
Authors
Affiliations
Soon will be listed here.
Abstract

The sequencing of the full transcriptome (RNA-seq) has become the preferred choice for the measurement of genome-wide gene expression. Despite its widespread use, challenges remain in RNA-seq data analysis. One often-overlooked aspect is normalization. Despite the fact that a variety of factors or 'batch effects' can contribute unwanted variation to the data, commonly used RNA-seq normalization methods only correct for sequencing depth. The study of gene expression is particularly problematic when it is influenced simultaneously by a variety of biological factors in addition to the one of interest. Using examples from experimental neuroscience, we show that batch effects can dominate the signal of interest; and that the choice of normalization method affects the power and reproducibility of the results. While commonly used global normalization methods are not able to adequately normalize the data, more recently developed RNA-seq normalization can. We focus on one particular method, RUVSeq and show that it is able to increase power and biological insight of the results. Finally, we provide a tutorial outlining the implementation of RUVSeq normalization that is applicable to a broad range of studies as well as meta-analysis of publicly available data.

Citing Articles

A global transcriptional atlas of the effect of acute sleep deprivation in the mouse frontal cortex.

Ford K, Zuin E, Righelli D, Medina E, Schoch H, Singletary K iScience. 2024; 27(9):110752.

PMID: 39280614 PMC: 11402219. DOI: 10.1016/j.isci.2024.110752.


Transcriptional dynamics of sleep deprivation and subsequent recovery sleep in the male mouse cortex.

Popescu A, Ottaway C, Ford K, Patterson T, Ingiosi A, Medina E bioRxiv. 2024; .

PMID: 39229182 PMC: 11370348. DOI: 10.1101/2024.08.20.607983.


Influenza A virus during pregnancy disrupts maternal intestinal immunity and fetal cortical development in a dose- and time-dependent manner.

Otero A, Connolly M, Gonzalez-Ricon R, Wang S, Allen J, Antonson A Mol Psychiatry. 2024; 30(1):13-28.

PMID: 38961232 PMC: 11649561. DOI: 10.1038/s41380-024-02648-9.


Gestational age at birth influences protein and RNA content in human milk extracellular vesicles.

Vahkal B, Altosaar I, Tremblay E, Gagne D, Huttman N, Minic Z J Extracell Biol. 2024; 3(1):e128.

PMID: 38938674 PMC: 11080785. DOI: 10.1002/jex2.128.


Serotonin Transporter-dependent Histone Serotonylation in Placenta Contributes to the Neurodevelopmental Transcriptome.

Chan J, Alenina N, Cunningham A, Ramakrishnan A, Shen L, Bader M J Mol Biol. 2024; 436(7):168454.

PMID: 38266980 PMC: 10957302. DOI: 10.1016/j.jmb.2024.168454.


References
1.
Barnes P, Kirtley A, Thomas K . Quantitatively and qualitatively different cellular processes are engaged in CA1 during the consolidation and reconsolidation of contextual fear memory. Hippocampus. 2010; 22(2):149-71. DOI: 10.1002/hipo.20879. View

2.
Zovkic I, Paulukaitis B, Day J, Etikala D, Sweatt J . Histone H2A.Z subunit exchange controls consolidation of recent and remote memory. Nature. 2014; 515(7528):582-6. PMC: 4768489. DOI: 10.1038/nature13707. View

3.
Anders S, Pyl P, Huber W . HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics. 2014; 31(2):166-9. PMC: 4287950. DOI: 10.1093/bioinformatics/btu638. View

4.
Leek J, Scharpf R, Corrada Bravo H, Simcha D, Langmead B, Johnson W . Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet. 2010; 11(10):733-9. PMC: 3880143. DOI: 10.1038/nrg2825. View

5.
Robinson M, McCarthy D, Smyth G . edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2009; 26(1):139-40. PMC: 2796818. DOI: 10.1093/bioinformatics/btp616. View