» Articles » PMID: 34164114

RNAmining: A Machine Learning Stand-alone and Web Server Tool for RNA Coding Potential Prediction

Overview
Journal F1000Res
Date 2021 Jun 25
PMID 34164114
Citations 5
Authors
Affiliations
Soon will be listed here.
Abstract

Non-coding RNAs (ncRNAs) are important players in the cellular regulation of organisms from different kingdoms. One of the key steps in ncRNAs research is the ability to distinguish coding/non-coding sequences. We applied seven machine learning algorithms (Naive Bayes, Support Vector Machine, K-Nearest Neighbors, Random Forest, Extreme Gradient Boosting, Neural Networks and Deep Learning) through model organisms from different evolutionary branches to create a stand-alone and web server tool (RNAmining) to distinguish coding and non-coding sequences. Firstly, we used coding/non-coding sequences downloaded from Ensembl (April 14th, 2020). Then, coding/non-coding sequences were balanced, had their trinucleotides count analysed (64 features) and we performed a normalization by the sequence length, resulting in total of 180 models. The machine learning algorithms validations were performed using 10-fold cross-validation and we selected the algorithm with the best results (eXtreme Gradient Boosting) to implement at RNAmining. Best F1-scores ranged from 97.56% to 99.57% depending on the organism. Moreover, we produced a benchmarking with other tools already in literature (CPAT, CPC2, RNAcon and TransDecoder) and our results outperformed them. Both stand-alone and web server versions of RNAmining are freely available at https://rnamining.integrativebioinformatics.me/.

Citing Articles

Transcriptomic Analysis Reveals Adaptive Evolution and Conservation Implications for the Endangered .

Shi C, Xie Y, Guan D, Qin G Genes (Basel). 2024; 15(6).

PMID: 38927723 PMC: 11203017. DOI: 10.3390/genes15060787.


A task-specific encoding algorithm for RNAs and RNA-associated interactions based on convolutional autoencoder.

Wang Y, Pan Z, Mou M, Xia W, Zhang H, Zhang H Nucleic Acids Res. 2023; 51(21):e110.

PMID: 37889083 PMC: 10682500. DOI: 10.1093/nar/gkad929.


Discovery of putative long non-coding RNAs expressed in the eyes of Astyanax mexicanus (Actinopterygii: Characidae).

Batista da Silva I, Barbosa D, Kavalco K, Nunes L, Pasa R, Menegidio F Sci Rep. 2023; 13(1):12051.

PMID: 37491348 PMC: 10368750. DOI: 10.1038/s41598-023-34198-5.


RNAincoder: a deep learning-based encoder for RNA and RNA-associated interaction.

Wang Y, Chen Z, Pan Z, Huang S, Liu J, Xia W Nucleic Acids Res. 2023; 51(W1):W509-W519.

PMID: 37166951 PMC: 10320175. DOI: 10.1093/nar/gkad404.


Pervasive translation of small open reading frames in plant long non-coding RNAs.

Sruthi K, Menon A, P A, Soniya E Front Plant Sci. 2022; 13:975938.

PMID: 36352887 PMC: 9638090. DOI: 10.3389/fpls.2022.975938.

References
1.
Mattick J . The central role of RNA in the genetic programming of complex organisms. An Acad Bras Cienc. 2010; 82(4):933-9. DOI: 10.1590/s0001-37652010000400016. View

2.
Zhao Y, Ransom J, Li A, Vedantham V, von Drehle M, Muth A . Dysregulation of cardiogenesis, cardiac conduction, and cell cycle in mice lacking miRNA-1-2. Cell. 2007; 129(2):303-17. DOI: 10.1016/j.cell.2007.03.030. View

3.
Nachtigall P, Kashiwabara A, Durham A . CodAn: predictive models for precise identification of coding regions in eukaryotic transcripts. Brief Bioinform. 2020; 22(3). PMC: 8138839. DOI: 10.1093/bib/bbaa045. View

4.
Djebali S, Davis C, Merkel A, Dobin A, Lassmann T, Mortazavi A . Landscape of transcription in human cells. Nature. 2012; 489(7414):101-8. PMC: 3684276. DOI: 10.1038/nature11233. View

5.
Torres F, Arias-Carrasco R, Caris-Maldonado J, Barral A, Maracaja-Coutinho V, de Queiroz A . LeishDB: a database of coding gene annotation and non-coding RNAs in Leishmania braziliensis. Database (Oxford). 2017; 2017. PMC: 5502370. DOI: 10.1093/database/bax047. View