» Articles » PMID: 31362694

IRESpy: an XGBoost Model for Prediction of Internal Ribosome Entry Sites

Overview
Publisher Biomed Central
Specialty Biology
Date 2019 Aug 1
PMID 31362694
Citations 35
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Internal ribosome entry sites (IRES) are segments of mRNA found in untranslated regions that can recruit the ribosome and initiate translation independently of the 5' cap-dependent translation initiation mechanism. IRES usually function when 5' cap-dependent translation initiation has been blocked or repressed. They have been widely found to play important roles in viral infections and cellular processes. However, a limited number of confirmed IRES have been reported due to the requirement for highly labor intensive, slow, and low efficiency laboratory experiments. Bioinformatics tools have been developed, but there is no reliable online tool.

Results: This paper systematically examines the features that can distinguish IRES from non-IRES sequences. Sequence features such as kmer words, structural features such as Q, and sequence/structure hybrid features are evaluated as possible discriminators. They are incorporated into an IRES classifier based on XGBoost. The XGBoost model performs better than previous classifiers, with higher accuracy and much shorter computational time. The number of features in the model has been greatly reduced, compared to previous predictors, by including global kmer and structural features. The contributions of model features are well explained by LIME and SHapley Additive exPlanations. The trained XGBoost model has been implemented as a bioinformatics tool for IRES prediction, IRESpy (https://irespy.shinyapps.io/IRESpy/), which has been applied to scan the human 5' UTR and find novel IRES segments.

Conclusions: IRESpy is a fast, reliable, high-throughput IRES online prediction tool. It provides a publicly available tool for all IRES researchers, and can be used in other genomics applications such as gene annotation and analysis of differential gene expression.

Citing Articles

Long non-coding RNA-encoded micropeptides: functions, mechanisms and implications.

Xiao Y, Ren Y, Hu W, Paliouras A, Zhang W, Zhong L Cell Death Discov. 2024; 10(1):450.

PMID: 39443468 PMC: 11499885. DOI: 10.1038/s41420-024-02175-0.


DeepIRES: a hybrid deep learning model for accurate identification of internal ribosome entry sites in cellular and viral mRNAs.

Zhao J, Chen Z, Zhang M, Zou L, He S, Liu J Brief Bioinform. 2024; 25(5).

PMID: 39234953 PMC: 11375421. DOI: 10.1093/bib/bbae439.


A 5' UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions.

Chu Y, Yu D, Li Y, Huang K, Shen Y, Cong L Nat Mach Intell. 2024; 6(4):449-460.

PMID: 38855263 PMC: 11155392. DOI: 10.1038/s42256-024-00823-9.


Parvovirus B19 and Human Parvovirus 4 Encode Similar Proteins in a Reading Frame Overlapping the VP1 Capsid Gene.

Karlin D Viruses. 2024; 16(2).

PMID: 38399966 PMC: 10891878. DOI: 10.3390/v16020191.


PML Body Biogenesis: A Delicate Balance of Interactions.

Silonov S, Smirnov E, Kuznetsova I, Turoverov K, Fonin A Int J Mol Sci. 2023; 24(23).

PMID: 38069029 PMC: 10705990. DOI: 10.3390/ijms242316702.


References
1.
Moore P . Structural motifs in RNA. Annu Rev Biochem. 2000; 68:287-300. DOI: 10.1146/annurev.biochem.68.1.287. View

2.
Martinez-Salas E, Lopez de Quinto S, Ramos R, Fernandez-Miragall O . IRES elements: features of the RNA structure contributing to their activity. Biochimie. 2002; 84(8):755-63. DOI: 10.1016/s0300-9084(02)01408-6. View

3.
Jan E, Sarnow P . Factorless ribosome assembly on the internal ribosome entry site of cricket paralysis virus. J Mol Biol. 2002; 324(5):889-902. DOI: 10.1016/s0022-2836(02)01099-9. View

4.
Zuker M . Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003; 31(13):3406-15. PMC: 169194. DOI: 10.1093/nar/gkg595. View

5.
Fernandez-Miragall O, Martinez-Salas E . Structural organization of a viral IRES depends on the integrity of the GNRA motif. RNA. 2003; 9(11):1333-44. PMC: 1287055. DOI: 10.1261/rna.5950603. View