PseU-ST: A New Stacked Ensemble-learning Method for Identifying RNA Pseudouridine Sites
Overview
Authors
Affiliations
Pseudouridine (Ψ) is one of the most abundant RNA modifications found in a variety of RNA types, and it plays a significant role in many biological processes. The key to studying the various biochemical functions and mechanisms of Ψ is to identify the Ψ sites. However, identifying Ψ sites using experimental methods is time-consuming and expensive. Therefore, it is necessary to develop computational methods that can accurately predict Ψ sites based on RNA sequence information. In this study, we proposed a new model called PseU-ST to identify Ψ sites in , , and . We selected the best six encoding schemes and four machine learning algorithms based on a comprehensive test of almost all of the RNA sequence encoding schemes available in the iLearnPlus software package, and selected the optimal features for each encoding scheme using chi-square and incremental feature selection algorithms. Then, we selected the optimal feature combination and the best base-classifier combination for each species through an extensive performance comparison and employed a stacking strategy to build the predictive model. The results demonstrated that PseU-ST achieved better prediction performance compared with other existing models. The PseU-ST accuracy scores were 93.64%, 87.74%, and 89.64% on H_990, S_628, and M_944, respectively, representing increments of 13.94%, 6.05%, and 0.26%, respectively, higher than the best existing methods on the same benchmark training datasets. The data indicate that PseU-ST is a very competitive prediction model for identifying RNA Ψ sites in , , and . In addition, we found that the Position-specific trinucleotide propensity based on single strand (PSTNPss) and Position-specific of three nucleotides (PS3) features play an important role in Ψ site identification. The source code for PseU-ST and the data are obtainable in our GitHub repository (https://github.com/jluzhangxinrubio/PseU-ST).
Harun-Or-Roshid M, Pham N, Manavalan B, Kurata H PLoS One. 2024; 19(6):e0305406.
PMID: 38924058 PMC: 11207182. DOI: 10.1371/journal.pone.0305406.
Exploring the Potential of GANs in Biological Sequence Analysis.
Murad T, Ali S, Patterson M Biology (Basel). 2023; 12(6).
PMID: 37372139 PMC: 10295061. DOI: 10.3390/biology12060854.