PseU-ST: A New Stacked Ensemble-learning Method for Identifying RNA Pseudouridine Sites

Overview

Journal Front Genet

Date 2023 Feb 6

PMID 36741328

Authors

Xinru Zhang

Shutao Wang

Lina Xie

Yuhui Zhu

Affiliations

Soon will be listed here.

Abstract

Pseudouridine (Ψ) is one of the most abundant RNA modifications found in a variety of RNA types, and it plays a significant role in many biological processes. The key to studying the various biochemical functions and mechanisms of Ψ is to identify the Ψ sites. However, identifying Ψ sites using experimental methods is time-consuming and expensive. Therefore, it is necessary to develop computational methods that can accurately predict Ψ sites based on RNA sequence information. In this study, we proposed a new model called PseU-ST to identify Ψ sites in , , and . We selected the best six encoding schemes and four machine learning algorithms based on a comprehensive test of almost all of the RNA sequence encoding schemes available in the iLearnPlus software package, and selected the optimal features for each encoding scheme using chi-square and incremental feature selection algorithms. Then, we selected the optimal feature combination and the best base-classifier combination for each species through an extensive performance comparison and employed a stacking strategy to build the predictive model. The results demonstrated that PseU-ST achieved better prediction performance compared with other existing models. The PseU-ST accuracy scores were 93.64%, 87.74%, and 89.64% on H_990, S_628, and M_944, respectively, representing increments of 13.94%, 6.05%, and 0.26%, respectively, higher than the best existing methods on the same benchmark training datasets. The data indicate that PseU-ST is a very competitive prediction model for identifying RNA Ψ sites in , , and . In addition, we found that the Position-specific trinucleotide propensity based on single strand (PSTNPss) and Position-specific of three nucleotides (PS3) features play an important role in Ψ site identification. The source code for PseU-ST and the data are obtainable in our GitHub repository (https://github.com/jluzhangxinrubio/PseU-ST).

Citing Articles

Meta-2OM: A multi-classifier meta-model for the accurate prediction of RNA 2'-O-methylation sites in human RNA.

Harun-Or-Roshid M, Pham N, Manavalan B, Kurata H PLoS One. 2024; 19(6):e0305406.

PMID: 38924058 PMC: 11207182. DOI: 10.1371/journal.pone.0305406.

Exploring the Potential of GANs in Biological Sequence Analysis.

Murad T, Ali S, Patterson M Biology (Basel). 2023; 12(6).

PMID: 37372139 PMC: 10295061. DOI: 10.3390/biology12060854.

References

Itoh K, Mizugaki M, Ishida N . Detection of elevated amounts of urinary pseudouridine in cancer patients by use of a monoclonal antibody. Clin Chim Acta. 1989; 181(3):305-15. DOI: 10.1016/0009-8981(89)90236-2. View

Shah A, Malik H, Mohammad A, Khan Y, Alourani A . Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma. Sci Rep. 2022; 12(1):11738. PMC: 9273792. DOI: 10.1038/s41598-022-15533-8. View

Boo S, Kim Y . The emerging role of RNA modifications in the regulation of mRNA stability. Exp Mol Med. 2020; 52(3):400-408. PMC: 7156397. DOI: 10.1038/s12276-020-0407-z. View

Yin S, Tian X, Zhang J, Sun P, Li G . PCirc: random forest-based plant circRNA identification software. BMC Bioinformatics. 2021; 22(1):10. PMC: 7789375. DOI: 10.1186/s12859-020-03944-1. View

Cui F, Zhang Z, Cao C, Zou Q, Chen D, Su X . Protein-DNA/RNA interactions: Machine intelligence tools and approaches in the era of artificial intelligence and big data. Proteomics. 2022; 22(8):e2100197. DOI: 10.1002/pmic.202100197. View

Li Y, Zhang G, Cui Q . PPUS: a web server to predict PUS-specific pseudouridine sites. Bioinformatics. 2015; 31(20):3362-4. DOI: 10.1093/bioinformatics/btv366. View

Dao F, Lv H, Wang F, Feng C, Ding H, Chen W . Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics. 2018; 35(12):2075-2083. DOI: 10.1093/bioinformatics/bty943. View

Mu Y, Zhang R, Wang L, Liu X . iPseU-Layer: Identifying RNA Pseudouridine Sites Using Layered Ensemble Model. Interdiscip Sci. 2020; 12(2):193-203. DOI: 10.1007/s12539-020-00362-y. View

Cao C, Wang J, Kwok D, Cui F, Zhang Z, Zhao D . webTWAS: a resource for disease candidate susceptibility genes identified by transcriptome-wide association study. Nucleic Acids Res. 2021; 50(D1):D1123-D1130. PMC: 8728162. DOI: 10.1093/nar/gkab957. View

10.

Chen W, Tang H, Ye J, Lin H, Chou K . iRNA-PseU: Identifying RNA pseudouridine sites. Mol Ther Nucleic Acids. 2017; 5:e332. PMC: 5330936. DOI: 10.1038/mtna.2016.37. View

11.

Schwartz S, Bernstein D, Mumbach M, Jovanovic M, Herbst R, Leon-Ricardo B . Transcriptome-wide mapping reveals widespread dynamic-regulated pseudouridylation of ncRNA and mRNA. Cell. 2014; 159(1):148-162. PMC: 4180118. DOI: 10.1016/j.cell.2014.08.028. View

12.

Wang X, Lin X, Wang R, Han N, Fan K, Han L . A Feature Fusion Predictor for RNA Pseudouridine Sites with Particle Swarm Optimizer Based Feature Selection and Ensemble Learning Approach. Curr Issues Mol Biol. 2021; 43(3):1844-1858. PMC: 8929013. DOI: 10.3390/cimb43030129. View

13.

Charette M, Gray M . Pseudouridine in RNA: what, where, how, and why. IUBMB Life. 2000; 49(5):341-51. DOI: 10.1080/152165400410182. View

14.

Adachi H, De Zoysa M, Yu Y . Post-transcriptional pseudouridylation in mRNA as well as in some major types of noncoding RNAs. Biochim Biophys Acta Gene Regul Mech. 2018; 1862(3):230-239. PMC: 6401265. DOI: 10.1016/j.bbagrm.2018.11.002. View

15.

Penzo M, Guerrieri A, Zacchini F, Trere D, Montanaro L . RNA Pseudouridylation in Physiology and Medicine: For Better and for Worse. Genes (Basel). 2017; 8(11). PMC: 5704214. DOI: 10.3390/genes8110301. View

16.

Chen W, Tran H, Liang Z, Lin H, Zhang L . Identification and analysis of the N(6)-methyladenosine in the Saccharomyces cerevisiae transcriptome. Sci Rep. 2015; 5:13859. PMC: 4561376. DOI: 10.1038/srep13859. View

17.

Niu M, Zou Q, Lin C . CRBPDL: Identification of circRNA-RBP interaction sites using an ensemble neural network approach. PLoS Comput Biol. 2022; 18(1):e1009798. PMC: 8806072. DOI: 10.1371/journal.pcbi.1009798. View

18.

Li F, Chen J, Ge Z, Wen Y, Yue Y, Hayashida M . Computational prediction and interpretation of both general and specific types of promoters in Escherichia coli by exploiting a stacked ensemble-learning framework. Brief Bioinform. 2020; 22(2):2126-2140. PMC: 7986616. DOI: 10.1093/bib/bbaa049. View

19.

Yan C, Wu F, Wang J, Duan G . PESM: predicting the essentiality of miRNAs based on gradient boosting machines and sequences. BMC Bioinformatics. 2020; 21(1):111. PMC: 7079416. DOI: 10.1186/s12859-020-3426-9. View

20.

Suresh S, Newton D, Everett 4th T, Lin G, Duerstock B . Feature Selection Techniques for a Machine Learning Model to Detect Autonomic Dysreflexia. Front Neuroinform. 2022; 16:901428. PMC: 9416695. DOI: 10.3389/fninf.2022.901428. View