» Articles » PMID: 36741328

PseU-ST: A New Stacked Ensemble-learning Method for Identifying RNA Pseudouridine Sites

Overview
Journal Front Genet
Date 2023 Feb 6
PMID 36741328
Authors
Affiliations
Soon will be listed here.
Abstract

Pseudouridine (Ψ) is one of the most abundant RNA modifications found in a variety of RNA types, and it plays a significant role in many biological processes. The key to studying the various biochemical functions and mechanisms of Ψ is to identify the Ψ sites. However, identifying Ψ sites using experimental methods is time-consuming and expensive. Therefore, it is necessary to develop computational methods that can accurately predict Ψ sites based on RNA sequence information. In this study, we proposed a new model called PseU-ST to identify Ψ sites in , , and . We selected the best six encoding schemes and four machine learning algorithms based on a comprehensive test of almost all of the RNA sequence encoding schemes available in the iLearnPlus software package, and selected the optimal features for each encoding scheme using chi-square and incremental feature selection algorithms. Then, we selected the optimal feature combination and the best base-classifier combination for each species through an extensive performance comparison and employed a stacking strategy to build the predictive model. The results demonstrated that PseU-ST achieved better prediction performance compared with other existing models. The PseU-ST accuracy scores were 93.64%, 87.74%, and 89.64% on H_990, S_628, and M_944, respectively, representing increments of 13.94%, 6.05%, and 0.26%, respectively, higher than the best existing methods on the same benchmark training datasets. The data indicate that PseU-ST is a very competitive prediction model for identifying RNA Ψ sites in , , and . In addition, we found that the Position-specific trinucleotide propensity based on single strand (PSTNPss) and Position-specific of three nucleotides (PS3) features play an important role in Ψ site identification. The source code for PseU-ST and the data are obtainable in our GitHub repository (https://github.com/jluzhangxinrubio/PseU-ST).

Citing Articles

Meta-2OM: A multi-classifier meta-model for the accurate prediction of RNA 2'-O-methylation sites in human RNA.

Harun-Or-Roshid M, Pham N, Manavalan B, Kurata H PLoS One. 2024; 19(6):e0305406.

PMID: 38924058 PMC: 11207182. DOI: 10.1371/journal.pone.0305406.


Exploring the Potential of GANs in Biological Sequence Analysis.

Murad T, Ali S, Patterson M Biology (Basel). 2023; 12(6).

PMID: 37372139 PMC: 10295061. DOI: 10.3390/biology12060854.

References
1.
Itoh K, Mizugaki M, Ishida N . Detection of elevated amounts of urinary pseudouridine in cancer patients by use of a monoclonal antibody. Clin Chim Acta. 1989; 181(3):305-15. DOI: 10.1016/0009-8981(89)90236-2. View

2.
Shah A, Malik H, Mohammad A, Khan Y, Alourani A . Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma. Sci Rep. 2022; 12(1):11738. PMC: 9273792. DOI: 10.1038/s41598-022-15533-8. View

3.
Boo S, Kim Y . The emerging role of RNA modifications in the regulation of mRNA stability. Exp Mol Med. 2020; 52(3):400-408. PMC: 7156397. DOI: 10.1038/s12276-020-0407-z. View

4.
Yin S, Tian X, Zhang J, Sun P, Li G . PCirc: random forest-based plant circRNA identification software. BMC Bioinformatics. 2021; 22(1):10. PMC: 7789375. DOI: 10.1186/s12859-020-03944-1. View

5.
Cui F, Zhang Z, Cao C, Zou Q, Chen D, Su X . Protein-DNA/RNA interactions: Machine intelligence tools and approaches in the era of artificial intelligence and big data. Proteomics. 2022; 22(8):e2100197. DOI: 10.1002/pmic.202100197. View