» Articles » PMID: 39159140

Deciphering 3'UTR Mediated Gene Regulation Using Interpretable Deep Representation Learning

Overview
Journal Adv Sci (Weinh)
Date 2024 Aug 19
PMID 39159140
Authors
Affiliations
Soon will be listed here.
Abstract

The 3' untranslated regions (3'UTRs) of messenger RNAs contain many important cis-regulatory elements that are under functional and evolutionary constraints. It is hypothesized that these constraints are similar to grammars and syntaxes in human languages and can be modeled by advanced natural language techniques such as Transformers, which has been very effective in modeling complex protein sequence and structures. Here 3UTRBERT is described, which implements an attention-based language model, i.e., Bidirectional Encoder Representations from Transformers (BERT). 3UTRBERT is pre-trained on aggregated 3'UTR sequences of human mRNAs in a task-agnostic manner; the pre-trained model is then fine-tuned for specific downstream tasks such as identifying RBP binding sites, m6A RNA modification sites, and predicting RNA sub-cellular localizations. Benchmark results show that 3UTRBERT generally outperformed other contemporary methods in each of these tasks. More importantly, the self-attention mechanism within 3UTRBERT allows direct visualization of the semantic relationship between sequence elements and effectively identifies regions with important regulatory potential. It is expected that 3UTRBERT model can serve as the foundational tool to analyze various sequence labeling tasks within the 3'UTR fields, thus enhancing the decipherability of post-transcriptional regulatory mechanisms.

Citing Articles

EnrichRBP: an automated and interpretable computational platform for predicting and analysing RNA-binding protein events.

Wang Y, Zhu H, Wang Y, Yang Y, Huang Y, Zhang J Bioinformatics. 2025; 41(1).

PMID: 39804669 PMC: 11783304. DOI: 10.1093/bioinformatics/btaf018.


A generative framework for enhanced cell-type specificity in rationally designed mRNAs.

Khoroshkin M, Zinkevich A, Aristova E, Yousefi H, Lee S, Mittmann T bioRxiv. 2025; .

PMID: 39803435 PMC: 11722239. DOI: 10.1101/2024.12.31.630783.


Deciphering 3'UTR Mediated Gene Regulation Using Interpretable Deep Representation Learning.

Yang Y, Li G, Pang K, Cao W, Zhang Z, Li X Adv Sci (Weinh). 2024; 11(39):e2407013.

PMID: 39159140 PMC: 11497048. DOI: 10.1002/advs.202407013.


Advancing bioinformatics with large language models: components, applications and perspectives.

Liu J, Yang M, Yu Y, Xu H, Wang T, Li K ArXiv. 2024; .

PMID: 38259343 PMC: 10802675.

References
1.
Yang Y, Hou Z, Wang Y, Ma H, Sun P, Ma Z . HCRNet: high-throughput circRNA-binding event identification from CLIP-seq data using deep temporal convolutional network. Brief Bioinform. 2022; 23(2). DOI: 10.1093/bib/bbac027. View

2.
Anders G, Mackowiak S, Jens M, Maaskola J, Kuntzagk A, Rajewsky N . doRiNA: a database of RNA interactions in post-transcriptional regulation. Nucleic Acids Res. 2011; 40(Database issue):D180-6. PMC: 3245013. DOI: 10.1093/nar/gkr1007. View

3.
Cui T, Dou Y, Tan P, Ni Z, Liu T, Wang D . RNALocate v2.0: an updated resource for RNA subcellular localization with increased coverage and annotation. Nucleic Acids Res. 2021; 50(D1):D333-D339. PMC: 8728251. DOI: 10.1093/nar/gkab825. View

4.
Hill S, Kuintzle R, Teegarden A, Merrill 3rd E, Danaee P, Hendrix D . A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential. Nucleic Acids Res. 2018; 46(16):8105-8113. PMC: 6144860. DOI: 10.1093/nar/gky567. View

5.
Mayr C . Evolution and Biological Roles of Alternative 3'UTRs. Trends Cell Biol. 2015; 26(3):227-237. PMC: 4955613. DOI: 10.1016/j.tcb.2015.10.012. View