» Articles » PMID: 24884968

Ensemble-based Classification Approach for Micro-RNA Mining Applied on Diverse Metagenomic Sequences

Overview
Journal BMC Res Notes
Publisher Biomed Central
Date 2014 Jun 3
PMID 24884968
Citations 2
Authors
Affiliations
Soon will be listed here.
Abstract

Background: MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificity and the high cost of the cloning procedure. Thus, computational identification of miRNAs from genomic sequences provide a valuable complement to cloning. Different approaches for identification of miRNAs have been proposed based on homology, thermodynamic parameters, and cross-species comparisons.

Results: The present paper focuses on the integration of miRNA classifiers in a meta-classifier and the identification of miRNAs from metagenomic sequences collected from different environments. An ensemble of classifiers is proposed for miRNA hairpin prediction based on four well-known classifiers (Triplet SVM, Mipred, Virgo and EumiR), with non-identical features, and which have been trained on different data. Their decisions are combined using a single hidden layer neural network to increase the accuracy of the predictions. Our ensemble classifier achieved 89.3% accuracy, 82.2% f-measure, 74% sensitivity, 97% specificity, 92.5% precision and 88.2% negative predictive value when tested on real miRNA and pseudo sequence data. The area under the receiver operating characteristic curve of our classifier is 0.9 which represents a high performance index.The proposed classifier yields a significant performance improvement relative to Triplet-SVM, Virgo and EumiR and a minor refinement over MiPred.The developed ensemble classifier is used for miRNA prediction in mine drainage, groundwater and marine metagenomic sequences downloaded from the NCBI sequence reed archive. By consulting the miRBase repository, 179 miRNAs have been identified as highly probable miRNAs. Our new approach could thus be used for mining metagenomic sequences and finding new and homologous miRNAs.

Conclusions: The paper investigates a computational tool for miRNA prediction in genomic or metagenomic data. It has been applied on three metagenomic samples from different environments (mine drainage, groundwater and marine metagenomic sequences). The prediction results provide a set of extremely potential miRNA hairpins for cloning prediction methods. Among the ensemble prediction obtained results there are pre-miRNA candidates that have been validated using miRbase while they have not been recognized by some of the base classifiers.

Citing Articles

Editorial: Computational modelling of cardiovascular hemodynamics and machine learning.

Bourantas C, Torii R, Karabasov S, Krams R Front Cardiovasc Med. 2024; 11:1355843.

PMID: 38455721 PMC: 10917996. DOI: 10.3389/fcvm.2024.1355843.


REGULATOR: a database of metazoan transcription factors and maternal factors for developmental studies.

Wang K, Nishida H BMC Bioinformatics. 2015; 16:114.

PMID: 25880930 PMC: 4411712. DOI: 10.1186/s12859-015-0552-x.

References
1.
Jiang P, Wu H, Wang W, Ma W, Sun X, Lu Z . MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features. Nucleic Acids Res. 2007; 35(Web Server issue):W339-44. PMC: 1933124. DOI: 10.1093/nar/gkm368. View

2.
Batuwita R, Palade V . microPred: effective classification of pre-miRNAs for human miRNA gene prediction. Bioinformatics. 2009; 25(8):989-95. DOI: 10.1093/bioinformatics/btp107. View

3.
Lai E, Tomancak P, Williams R, Rubin G . Computational identification of Drosophila microRNA genes. Genome Biol. 2003; 4(7):R42. PMC: 193629. DOI: 10.1186/gb-2003-4-7-r42. View

4.
Freyhult E, Gardner P, Moulton V . A comparison of RNA folding measures. BMC Bioinformatics. 2005; 6:241. PMC: 1274297. DOI: 10.1186/1471-2105-6-241. View

5.
Bonnet E, Wuyts J, Rouze P, Van de Peer Y . Detection of 91 potential conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes. Proc Natl Acad Sci U S A. 2004; 101(31):11511-6. PMC: 509231. DOI: 10.1073/pnas.0404025101. View