» Articles » PMID: 31484412

PredLnc-GFStack: A Global Sequence Feature Based on a Stacked Ensemble Learning Method for Predicting LncRNAs from Transcripts

Overview
Journal Genes (Basel)
Publisher MDPI
Date 2019 Sep 6
PMID 31484412
Citations 9
Authors
Affiliations
Soon will be listed here.
Abstract

Long non-coding RNAs (lncRNAs) are a class of RNAs with the length exceeding 200 base pairs (bps), which do not encode proteins, nevertheless, lncRNAs have many vital biological functions. A large number of novel transcripts were discovered as a result of the development of high-throughput sequencing technology. Under this circumstance, computational methods for lncRNA prediction are in great demand. In this paper, we consider global sequence features and propose a stacked ensemble learning-based method to predict lncRNAs from transcripts, abbreviated as PredLnc-GFStack. We extract the critical features from the candidate feature list using the genetic algorithm (GA) and then employ the stacked ensemble learning method to construct PredLnc-GFStack model. Computational experimental results show that PredLnc-GFStack outperforms several state-of-the-art methods for lncRNA prediction. Furthermore, PredLnc-GFStack demonstrates an outstanding ability for cross-species ncRNA prediction.

Citing Articles

Decoding the Non-coding: Tools and Databases Unveiling the Hidden World of "Junk" RNAs for Innovative Therapeutic Exploration.

Chaudhary U, Banerjee S ACS Pharmacol Transl Sci. 2024; 7(7):1901-1915.

PMID: 39022352 PMC: 11249652. DOI: 10.1021/acsptsci.3c00388.


RNAincoder: a deep learning-based encoder for RNA and RNA-associated interaction.

Wang Y, Chen Z, Pan Z, Huang S, Liu J, Xia W Nucleic Acids Res. 2023; 51(W1):W509-W519.

PMID: 37166951 PMC: 10320175. DOI: 10.1093/nar/gkad404.


Computational prediction of disease related lncRNAs using machine learning.

Khalid R, Naveed H, Khalid Z Sci Rep. 2023; 13(1):806.

PMID: 36646775 PMC: 9842610. DOI: 10.1038/s41598-023-27680-7.


A large-scale benchmark study of tools for the classification of protein-coding and non-coding RNAs.

Singh D, Roy J Nucleic Acids Res. 2022; 50(21):12094-12111.

PMID: 36420898 PMC: 9757047. DOI: 10.1093/nar/gkac1092.


Opportunities and Challenges of Predictive Approaches for the Non-coding RNA in Plants.

Xu D, Yuan W, Fan C, Liu B, Lu M, Zhang J Front Plant Sci. 2022; 13:890663.

PMID: 35498708 PMC: 9048598. DOI: 10.3389/fpls.2022.890663.


References
1.
Prensner J, Chinnaiyan A . The emergence of lncRNAs in cancer biology. Cancer Discov. 2011; 1(5):391-407. PMC: 3215093. DOI: 10.1158/2159-8290.CD-11-0209. View

2.
Fu L, Niu B, Zhu Z, Wu S, Li W . CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012; 28(23):3150-2. PMC: 3516142. DOI: 10.1093/bioinformatics/bts565. View

3.
Tong X, Liu S . CPPred: coding potential prediction based on the global description of RNA sequence. Nucleic Acids Res. 2019; 47(8):e43. PMC: 6486542. DOI: 10.1093/nar/gkz087. View

4.
Li W, Godzik A . Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006; 22(13):1658-9. DOI: 10.1093/bioinformatics/btl158. View

5.
Zhang W, Yue X, Tang G, Wu W, Huang F, Zhang X . SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions. PLoS Comput Biol. 2018; 14(12):e1006616. PMC: 6331124. DOI: 10.1371/journal.pcbi.1006616. View