» Articles » PMID: 36699410

Prediction of RNA-protein Interactions Using a Nucleotide Language Model

Overview
Journal Bioinform Adv
Specialty Biology
Date 2023 Jan 26
PMID 36699410
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: The accumulation of sequencing data has enabled researchers to predict the interactions between RNA sequences and RNA-binding proteins (RBPs) using novel machine learning techniques. However, existing models are often difficult to interpret and require additional information to sequences. Bidirectional encoder representations from transformer (BERT) is a language-based deep learning model that is highly interpretable. Therefore, a model based on BERT architecture can potentially overcome such limitations.

Results: Here, we propose BERT-RBP as a model to predict RNA-RBP interactions by adapting the BERT architecture pretrained on a human reference genome. Our model outperformed state-of-the-art prediction models using the eCLIP-seq data of 154 RBPs. The detailed analysis further revealed that BERT-RBP could recognize both the transcript region type and RNA secondary structure only based on sequence information. Overall, the results provide insights into the fine-tuning mechanism of BERT in biological contexts and provide evidence of the applicability of the model to other RNA-related problems.

Availability And Implementation: Python source codes are freely available at https://github.com/kkyamada/bert-rbp. The datasets underlying this article were derived from sources in the public domain: [RBPsuite (http://www.csbio.sjtu.edu.cn/bioinf/RBPsuite/), Ensembl Biomart (http://asia.ensembl.org/biomart/martview/)].

Supplementary Information: Supplementary data are available at online.

Citing Articles

RBPsuite 2.0: an updated RNA-protein binding site prediction suite with high coverage on species and proteins based on deep learning.

Pan X, Fang Y, Liu X, Guo X, Shen H BMC Biol. 2025; 23(1):74.

PMID: 40069726 PMC: 11899677. DOI: 10.1186/s12915-025-02182-2.


RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models.

Asim M, Ibrahim M, Asif T, Dengel A Heliyon. 2025; 11(2):e41488.

PMID: 39897847 PMC: 11783440. DOI: 10.1016/j.heliyon.2024.e41488.


RNAelem: an algorithm for discovering sequence-structure motifs in RNA bound by RNA-binding proteins.

Miyake H, Kawaguchi R, Kiryu H Bioinform Adv. 2024; 4(1):vbae144.

PMID: 39399375 PMC: 11471262. DOI: 10.1093/bioadv/vbae144.


scEGG: an exogenous gene-guided clustering method for single-cell transcriptomic data.

Hu D, Guan R, Liang K, Yu H, Quan H, Zhao Y Brief Bioinform. 2024; 25(6).

PMID: 39344711 PMC: 11440090. DOI: 10.1093/bib/bbae483.


Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook.

AlSaad R, Abd-Alrazaq A, Boughorbel S, Ahmed A, Renault M, Damseh R J Med Internet Res. 2024; 26:e59505.

PMID: 39321458 PMC: 11464944. DOI: 10.2196/59505.


References
1.
Strazar M, Zitnik M, Zupan B, Ule J, Curk T . Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins. Bioinformatics. 2016; 32(10):1527-35. PMC: 4894278. DOI: 10.1093/bioinformatics/btw003. View

2.
Maticzka D, Lange S, Costa F, Backofen R . GraphProt: modeling binding preferences of RNA-binding proteins. Genome Biol. 2014; 15(1):R17. PMC: 4053806. DOI: 10.1186/gb-2014-15-1-r17. View

3.
Chung T, Kim D . Prediction of binding property of RNA-binding proteins using multi-sized filters and multi-modal deep convolutional neural network. PLoS One. 2019; 14(4):e0216257. PMC: 6485761. DOI: 10.1371/journal.pone.0216257. View

4.
Rodriguez J, Maietta P, Ezkurdia I, Pietrelli A, Wesselink J, Lopez G . APPRIS: annotation of principal and alternative splice isoforms. Nucleic Acids Res. 2012; 41(Database issue):D110-7. PMC: 3531113. DOI: 10.1093/nar/gks1058. View

5.
Pan X, Rijnbeek P, Yan J, Shen H . Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genomics. 2018; 19(1):511. PMC: 6029131. DOI: 10.1186/s12864-018-4889-1. View