RNAmining: A Machine Learning Stand-alone and Web Server Tool for RNA Coding Potential Prediction
Overview
Science
Authors
Affiliations
Non-coding RNAs (ncRNAs) are important players in the cellular regulation of organisms from different kingdoms. One of the key steps in ncRNAs research is the ability to distinguish coding/non-coding sequences. We applied seven machine learning algorithms (Naive Bayes, Support Vector Machine, K-Nearest Neighbors, Random Forest, Extreme Gradient Boosting, Neural Networks and Deep Learning) through model organisms from different evolutionary branches to create a stand-alone and web server tool (RNAmining) to distinguish coding and non-coding sequences. Firstly, we used coding/non-coding sequences downloaded from Ensembl (April 14th, 2020). Then, coding/non-coding sequences were balanced, had their trinucleotides count analysed (64 features) and we performed a normalization by the sequence length, resulting in total of 180 models. The machine learning algorithms validations were performed using 10-fold cross-validation and we selected the algorithm with the best results (eXtreme Gradient Boosting) to implement at RNAmining. Best F1-scores ranged from 97.56% to 99.57% depending on the organism. Moreover, we produced a benchmarking with other tools already in literature (CPAT, CPC2, RNAcon and TransDecoder) and our results outperformed them. Both stand-alone and web server versions of RNAmining are freely available at https://rnamining.integrativebioinformatics.me/.
Shi C, Xie Y, Guan D, Qin G Genes (Basel). 2024; 15(6).
PMID: 38927723 PMC: 11203017. DOI: 10.3390/genes15060787.
Wang Y, Pan Z, Mou M, Xia W, Zhang H, Zhang H Nucleic Acids Res. 2023; 51(21):e110.
PMID: 37889083 PMC: 10682500. DOI: 10.1093/nar/gkad929.
Batista da Silva I, Barbosa D, Kavalco K, Nunes L, Pasa R, Menegidio F Sci Rep. 2023; 13(1):12051.
PMID: 37491348 PMC: 10368750. DOI: 10.1038/s41598-023-34198-5.
RNAincoder: a deep learning-based encoder for RNA and RNA-associated interaction.
Wang Y, Chen Z, Pan Z, Huang S, Liu J, Xia W Nucleic Acids Res. 2023; 51(W1):W509-W519.
PMID: 37166951 PMC: 10320175. DOI: 10.1093/nar/gkad404.
Pervasive translation of small open reading frames in plant long non-coding RNAs.
Sruthi K, Menon A, P A, Soniya E Front Plant Sci. 2022; 13:975938.
PMID: 36352887 PMC: 9638090. DOI: 10.3389/fpls.2022.975938.