PseUI: Pseudouridine Sites Identification Based on RNA Sequence Information

Overview

Journal BMC Bioinformatics

Publisher Biomed Central

Specialty Biology

Date 2018 Aug 31

PMID 30157750

Citations 50

Authors

Jingjing He

Ting Fang

Zizheng Zhang

Bei Huang

Xiaolei Zhu

Yi Xiong

Affiliations

Soon will be listed here.

Abstract

Background: Pseudouridylation is the most prevalent type of posttranscriptional modification in various stable RNAs of all organisms, which significantly affects many cellular processes that are regulated by RNA. Thus, accurate identification of pseudouridine (Ψ) sites in RNA will be of great benefit for understanding these cellular processes. Due to the low efficiency and high cost of current available experimental methods, it is highly desirable to develop computational methods for accurately and efficiently detecting Ψ sites in RNA sequences. However, the predictive accuracy of existing computational methods is not satisfactory and still needs improvement.

Results: In this study, we developed a new model, PseUI, for Ψ sites identification in three species, which are H. sapiens, S. cerevisiae, and M. musculus. Firstly, five different kinds of features including nucleotide composition (NC), dinucleotide composition (DC), pseudo dinucleotide composition (pseDNC), position-specific nucleotide propensity (PSNP), and position-specific dinucleotide propensity (PSDP) were generated based on RNA segments. Then, a sequential forward feature selection strategy was used to gain an effective feature subset with a compact representation but discriminative prediction power. Based on the selected feature subsets, we built our model by using a support vector machine (SVM). Finally, the generalization of our model was validated by both the jackknife test and independent validation tests on the benchmark datasets. The experimental results showed that our model is more accurate and stable than the previously published models. We have also provided a user-friendly web server for our model at http://zhulab.ahu.edu.cn/PseUI , and a brief instruction for the web server is provided in this paper. By using this instruction, the academic users can conveniently get their desired results without complicated calculations.

Conclusion: In this study, we proposed a new predictor, PseUI, to detect Ψ sites in RNA sequences. It is shown that our model outperformed the existing state-of-art models. It is expected that our model, PseUI, will become a useful tool for accurate identification of RNA Ψ sites.

Citing Articles

Bioinformatics for Inosine: Tools and Approaches to Trace This Elusive RNA Modification.

Bortoletto E, Rosani U Genes (Basel). 2024; 15(8).

PMID: 39202357 PMC: 11353476. DOI: 10.3390/genes15080996.

PseUpred-ELPSO Is an Ensemble Learning Predictor with Particle Swarm Optimizer for Improving the Prediction of RNA Pseudouridine Sites.

Wang X, Li P, Wang R, Gao X Biology (Basel). 2024; 13(4).

PMID: 38666860 PMC: 11048358. DOI: 10.3390/biology13040248.

Fuzzy kernel evidence Random Forest for identifying pseudouridine sites.

Chen M, Sun M, Su X, Tiwari P, Ding Y Brief Bioinform. 2024; 25(3).

PMID: 38622357 PMC: 11018548. DOI: 10.1093/bib/bbae169.

Interpretable Multi-Scale Deep Learning for RNA Methylation Analysis across Multiple Species.

Wang R, Chung C, Lee T Int J Mol Sci. 2024; 25(5).

PMID: 38474116 PMC: 10932270. DOI: 10.3390/ijms25052869.

PseU-ST: A new stacked ensemble-learning method for identifying RNA pseudouridine sites.

Zhang X, Wang S, Xie L, Zhu Y Front Genet. 2023; 14:1121694.

PMID: 36741328 PMC: 9892456. DOI: 10.3389/fgene.2023.1121694.

References

Sukumar S, Zhu X, Ericksen S, Mitchell J . DBSI server: DNA binding site identifier. Bioinformatics. 2016; 32(18):2853-5. DOI: 10.1093/bioinformatics/btw315. View

Liu Z, Xiao X, Yu D, Jia J, Qiu W, Chou K . pRNAm-PC: Predicting N(6)-methyladenosine sites in RNA sequences via physical-chemical properties. Anal Biochem. 2016; 497:60-7. DOI: 10.1016/j.ab.2015.12.017. View

Tang Y, Chen Y, Canchaya C, Zhang Z . GANNPhos: a new phosphorylation site predictor based on a genetic algorithm integrated neural network. Protein Eng Des Sel. 2007; 20(8):405-12. DOI: 10.1093/protein/gzm035. View

Wang Y, Zhang Q, Sun M, Guo D . High-accuracy prediction of bacterial type III secreted effectors based on position-specific amino acid composition profiles. Bioinformatics. 2011; 27(6):777-84. DOI: 10.1093/bioinformatics/btr021. View

Cantara W, Crain P, Rozenski J, McCloskey J, Harris K, Zhang X . The RNA Modification Database, RNAMDB: 2011 update. Nucleic Acids Res. 2010; 39(Database issue):D195-201. PMC: 3013656. DOI: 10.1093/nar/gkq1028. View

Chou K . Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol. 2010; 273(1):236-47. PMC: 7125570. DOI: 10.1016/j.jtbi.2010.12.024. View

Shao J, Xu D, Tsai S, Wang Y, Ngai S . Computational identification of protein methylation sites through bi-profile Bayes feature extraction. PLoS One. 2009; 4(3):e4920. PMC: 2654709. DOI: 10.1371/journal.pone.0004920. View

Chou K . A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins. J Biol Chem. 1993; 268(23):16938-48. View

Lovejoy A, Riordan D, Brown P . Transcriptome-wide mapping of pseudouridines: pseudouridine synthases modify specific mRNAs in S. cerevisiae. PLoS One. 2014; 9(10):e110799. PMC: 4212993. DOI: 10.1371/journal.pone.0110799. View

10.

Dunin-Horkawicz S, Czerwoniec A, Gajda M, Feder M, Grosjean H, Bujnicki J . MODOMICS: a database of RNA modification pathways. Nucleic Acids Res. 2005; 34(Database issue):D145-9. PMC: 1347447. DOI: 10.1093/nar/gkj084. View

11.

Liu Z, Xiao X, Qiu W, Chou K . iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. Anal Biochem. 2015; 474:69-77. DOI: 10.1016/j.ab.2014.12.009. View

12.

Song J, Li F, Leier A, Marquez-Lago T, Akutsu T, Haffari G . PROSPERous: high-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy. Bioinformatics. 2017; 34(4):684-687. PMC: 5860617. DOI: 10.1093/bioinformatics/btx670. View

13.

Chen W, Lei T, Jin D, Lin H, Chou K . PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. Anal Biochem. 2014; 456:53-60. DOI: 10.1016/j.ab.2014.04.001. View

14.

Xu Y, Shao X, Wu L, Deng N, Chou K . iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins. PeerJ. 2013; 1:e171. PMC: 3792191. DOI: 10.7717/peerj.171. View

15.

Yuan Q, Gao J, Wu D, Zhang S, Mamitsuka H, Zhu S . DrugE-Rank: improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank. Bioinformatics. 2016; 32(12):i18-i27. PMC: 4908328. DOI: 10.1093/bioinformatics/btw244. View

16.

Schwartz S, Bernstein D, Mumbach M, Jovanovic M, Herbst R, Leon-Ricardo B . Transcriptome-wide mapping reveals widespread dynamic-regulated pseudouridylation of ncRNA and mRNA. Cell. 2014; 159(1):148-162. PMC: 4180118. DOI: 10.1016/j.cell.2014.08.028. View

17.

Lin H, Deng E, Ding H, Chen W, Chou K . iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res. 2014; 42(21):12961-72. PMC: 4245931. DOI: 10.1093/nar/gku1019. View

18.

Leclercq M, Diallo A, Blanchette M . Prediction of human miRNA target genes using computationally reconstructed ancestral mammalian sequences. Nucleic Acids Res. 2016; 45(2):556-566. PMC: 5314757. DOI: 10.1093/nar/gkw1085. View

19.

Feng P, Ding H, Yang H, Chen W, Lin H, Chou K . iRNA-PseColl: Identifying the Occurrence Sites of Different RNA Modifications by Incorporating Collective Effects of Nucleotides into PseKNC. Mol Ther Nucleic Acids. 2017; 7():155-163. PMC: 5415964. DOI: 10.1016/j.omtn.2017.03.006. View

20.

Cheng X, Zhao S, Xiao X, Chou K . iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals. Bioinformatics. 2017; 33(3):341-346. DOI: 10.1093/bioinformatics/btw644. View