PredLnc-GFStack: A Global Sequence Feature Based on a Stacked Ensemble Learning Method for Predicting LncRNAs from Transcripts
Overview
Affiliations
Long non-coding RNAs (lncRNAs) are a class of RNAs with the length exceeding 200 base pairs (bps), which do not encode proteins, nevertheless, lncRNAs have many vital biological functions. A large number of novel transcripts were discovered as a result of the development of high-throughput sequencing technology. Under this circumstance, computational methods for lncRNA prediction are in great demand. In this paper, we consider global sequence features and propose a stacked ensemble learning-based method to predict lncRNAs from transcripts, abbreviated as PredLnc-GFStack. We extract the critical features from the candidate feature list using the genetic algorithm (GA) and then employ the stacked ensemble learning method to construct PredLnc-GFStack model. Computational experimental results show that PredLnc-GFStack outperforms several state-of-the-art methods for lncRNA prediction. Furthermore, PredLnc-GFStack demonstrates an outstanding ability for cross-species ncRNA prediction.
Chaudhary U, Banerjee S ACS Pharmacol Transl Sci. 2024; 7(7):1901-1915.
PMID: 39022352 PMC: 11249652. DOI: 10.1021/acsptsci.3c00388.
RNAincoder: a deep learning-based encoder for RNA and RNA-associated interaction.
Wang Y, Chen Z, Pan Z, Huang S, Liu J, Xia W Nucleic Acids Res. 2023; 51(W1):W509-W519.
PMID: 37166951 PMC: 10320175. DOI: 10.1093/nar/gkad404.
Computational prediction of disease related lncRNAs using machine learning.
Khalid R, Naveed H, Khalid Z Sci Rep. 2023; 13(1):806.
PMID: 36646775 PMC: 9842610. DOI: 10.1038/s41598-023-27680-7.
A large-scale benchmark study of tools for the classification of protein-coding and non-coding RNAs.
Singh D, Roy J Nucleic Acids Res. 2022; 50(21):12094-12111.
PMID: 36420898 PMC: 9757047. DOI: 10.1093/nar/gkac1092.
Opportunities and Challenges of Predictive Approaches for the Non-coding RNA in Plants.
Xu D, Yuan W, Fan C, Liu B, Lu M, Zhang J Front Plant Sci. 2022; 13:890663.
PMID: 35498708 PMC: 9048598. DOI: 10.3389/fpls.2022.890663.