» Articles » PMID: 35676633

EMDLP: Ensemble Multiscale Deep Learning Model for RNA Methylation Site Prediction

Overview
Publisher Biomed Central
Specialty Biology
Date 2022 Jun 8
PMID 35676633
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Recent research recommends that epi-transcriptome regulation through post-transcriptional RNA modifications is essential for all sorts of RNA. Exact identification of RNA modification is vital for understanding their purposes and regulatory mechanisms. However, traditional experimental methods of identifying RNA modification sites are relatively complicated, time-consuming, and laborious. Machine learning approaches have been applied in the procedures of RNA sequence features extraction and classification in a computational way, which may supplement experimental approaches more efficiently. Recently, convolutional neural network (CNN) and long short-term memory (LSTM) have been demonstrated achievements in modification site prediction on account of their powerful functions in representation learning. However, CNN can learn the local response from the spatial data but cannot learn sequential correlations. And LSTM is specialized for sequential modeling and can access both the contextual representation but lacks spatial data extraction compared with CNN. There is strong motivation to construct a prediction framework using natural language processing (NLP), deep learning (DL) for these reasons.

Results: This study presents an ensemble multiscale deep learning predictor (EMDLP) to identify RNA methylation sites in an NLP and DL way. It organically combines the dilated convolution and Bidirectional LSTM (BiLSTM), which helps to take better advantage of the local and global information for site prediction. The first step of EMDLP is to represent the RNA sequences in an NLP way. Thus, three encodings, e.g., RNA word embedding, One-hot encoding, and RGloVe, which is an improved learning method of word vector representation based on GloVe, are adopted to decipher sites from the viewpoints of the local and global information. Then, a dilated convolutional Bidirectional LSTM network (DCB) model is constructed with the dilated convolutional neural network (DCNN) followed by BiLSTM to extract potential contributing features for methylation site prediction. Finally, these three encoding methods are integrated by a soft vote to obtain better predictive performance. Experiment results on mA and mA reveal that the area under the receiver operating characteristic(AUROC) of EMDLP obtains respectively 95.56%, 85.24%, and outperforms the state-of-the-art models. To maximize user convenience, a user-friendly webserver for EMDLP was publicly available at http://www.labiip.net/EMDLP/index.php ( http://47.104.130.81/EMDLP/index.php ).

Conclusions: We developed a predictor for mA and mA methylation sites.

Citing Articles

RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models.

Asim M, Ibrahim M, Asif T, Dengel A Heliyon. 2025; 11(2):e41488.

PMID: 39897847 PMC: 11783440. DOI: 10.1016/j.heliyon.2024.e41488.


DeepIRES: a hybrid deep learning model for accurate identification of internal ribosome entry sites in cellular and viral mRNAs.

Zhao J, Chen Z, Zhang M, Zou L, He S, Liu J Brief Bioinform. 2024; 25(5).

PMID: 39234953 PMC: 11375421. DOI: 10.1093/bib/bbae439.


Molecular insights into regulatory RNAs in the cellular machinery.

Yang S, Kim S, Yang E, Kang M, Joo J Exp Mol Med. 2024; 56(6):1235-1249.

PMID: 38871819 PMC: 11263585. DOI: 10.1038/s12276-024-01239-6.


PseUpred-ELPSO Is an Ensemble Learning Predictor with Particle Swarm Optimizer for Improving the Prediction of RNA Pseudouridine Sites.

Wang X, Li P, Wang R, Gao X Biology (Basel). 2024; 13(4).

PMID: 38666860 PMC: 11048358. DOI: 10.3390/biology13040248.


Role of Post-Transcriptional Regulation in Learning and Memory in Mammals.

Di Liegro C, Schiera G, Schiro G, Di Liegro I Genes (Basel). 2024; 15(3).

PMID: 38540396 PMC: 10970538. DOI: 10.3390/genes15030337.


References
1.
Xuan J, Sun W, Lin P, Zhou K, Liu S, Zheng L . RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Res. 2017; 46(D1):D327-D334. PMC: 5753293. DOI: 10.1093/nar/gkx934. View

2.
Chen W, Feng P, Tang H, Ding H, Lin H . RAMPred: identifying the N(1)-methyladenosine sites in eukaryotic transcriptomes. Sci Rep. 2016; 6():31080. PMC: 4980636. DOI: 10.1038/srep31080. View

3.
Zou Q, Xing P, Wei L, Liu B . Gene2vec: gene subsequence embedding for prediction of mammalian -methyladenosine sites from mRNA. RNA. 2018; 25(2):205-218. PMC: 6348985. DOI: 10.1261/rna.069112.118. View

4.
Chen Z, Zhao P, Li F, Wang Y, Smith A, Webb G . Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences. Brief Bioinform. 2019; 21(5):1676-1696. DOI: 10.1093/bib/bbz112. View

5.
Min X, Zeng W, Chen N, Chen T, Jiang R . Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding. Bioinformatics. 2017; 33(14):i92-i101. PMC: 5870572. DOI: 10.1093/bioinformatics/btx234. View