» Articles » PMID: 25888698

Optimization of MiRNA-seq Data Preprocessing

Overview
Journal Brief Bioinform
Specialty Biology
Date 2015 Apr 19
PMID 25888698
Citations 71
Authors
Affiliations
Soon will be listed here.
Abstract

The past two decades of microRNA (miRNA) research has solidified the role of these small non-coding RNAs as key regulators of many biological processes and promising biomarkers for disease. The concurrent development in high-throughput profiling technology has further advanced our understanding of the impact of their dysregulation on a global scale. Currently, next-generation sequencing is the platform of choice for the discovery and quantification of miRNAs. Despite this, there is no clear consensus on how the data should be preprocessed before conducting downstream analyses. Often overlooked, data preprocessing is an essential step in data analysis: the presence of unreliable features and noise can affect the conclusions drawn from downstream analyses. Using a spike-in dilution study, we evaluated the effects of several general-purpose aligners (BWA, Bowtie, Bowtie 2 and Novoalign), and normalization methods (counts-per-million, total count scaling, upper quartile scaling, Trimmed Mean of M, DESeq, linear regression, cyclic loess and quantile) with respect to the final miRNA count data distribution, variance, bias and accuracy of differential expression analysis. We make practical recommendations on the optimal preprocessing methods for the extraction and interpretation of miRNA count data from small RNA-sequencing experiments.

Citing Articles

Eight quick tips for biologically and medically informed machine learning.

Oneto L, Chicco D PLoS Comput Biol. 2025; 21(1):e1012711.

PMID: 39787089 PMC: 11717244. DOI: 10.1371/journal.pcbi.1012711.


miRNA expression signatures induced by pasteurella multocida infection in goats lung.

Xu F, Zheng H, Dong X, Zhou A, Emu Q Sci Rep. 2024; 14(1):19626.

PMID: 39179681 PMC: 11343864. DOI: 10.1038/s41598-024-69654-3.


Unraveling the signaling network between dysregulated microRNA and mRNA expression in sevoflurane-induced developmental neurotoxicity in rat.

Wang Y, Men X, Huang X, Qiu X, Wang W, Zhou J Heliyon. 2024; 10(13):e33333.

PMID: 39027541 PMC: 11255675. DOI: 10.1016/j.heliyon.2024.e33333.


Dynamic changes in extracellular vesicle-associated miRNAs elicited by ultrasound in inflammatory bowel disease patients.

Tran F, Scharmacher A, Baran N, Mishra N, Wozny M, Chavez S Sci Rep. 2024; 14(1):10925.

PMID: 38740826 PMC: 11091140. DOI: 10.1038/s41598-024-61532-2.


Functional impact of multi-omic interactions in lung cancer.

Diaz-Campos M, Vasquez-Arriaga J, Ochoa S, Hernandez-Lemus E Front Genet. 2024; 15:1282241.

PMID: 38389572 PMC: 10881857. DOI: 10.3389/fgene.2024.1282241.


References
1.
Yang Y, Dudoit S, Luu P, Lin D, Peng V, Ngai J . Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002; 30(4):e15. PMC: 100354. DOI: 10.1093/nar/30.4.e15. View

2.
Lee P, Sladek R, Greenwood C, Hudson T . Control genes and variability: absence of ubiquitous reference transcripts in diverse mammalian expression studies. Genome Res. 2002; 12(2):292-7. PMC: 155273. DOI: 10.1101/gr.217802. View

3.
Bolstad B, Irizarry R, Astrand M, Speed T . A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003; 19(2):185-93. DOI: 10.1093/bioinformatics/19.2.185. View

4.
Irizarry R, Hobbs B, Collin F, Beazer-Barclay Y, Antonellis K, Scherf U . Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003; 4(2):249-64. DOI: 10.1093/biostatistics/4.2.249. View

5.
Park T, Yi S, Kang S, Lee S, Lee Y, Simon R . Evaluation of normalization methods for microarray data. BMC Bioinformatics. 2003; 4:33. PMC: 200968. DOI: 10.1186/1471-2105-4-33. View