» Articles » PMID: 30753596

CPPred: Coding Potential Prediction Based on the Global Description of RNA Sequence

Overview
Specialty Biochemistry
Date 2019 Feb 13
PMID 30753596
Citations 51
Authors
Affiliations
Soon will be listed here.
Abstract

The rapid and accurate approach to distinguish between coding RNAs and ncRNAs has been playing a critical role in analyzing thousands of novel transcripts, which have been generated in recent years by next-generation sequencing technology. Previously developed methods CPAT, CPC2 and PLEK can distinguish coding RNAs and ncRNAs very well, but poorly distinguish between small coding RNAs and small ncRNAs. Herein, we report an approach, CPPred (coding potential prediction), which is based on SVM classifier and multiple sequence features including novel RNA features encoded by the global description. The CPPred can better distinguish not only between coding RNAs and ncRNAs, but also between small coding RNAs and small ncRNAs than the state-of-the-art methods due to the addition of the novel RNA features. A recent study proposes 1335 novel human coding RNAs from a large number of RNA-seq datasets. However, only 119 transcripts are predicted as coding RNAs by the CPPred. In fact, almost all proposed novel coding RNAs are ncRNAs (91.1%), which is consistent with previous reports. Remarkably, we also reveal that the global description of encoding features (T2, C0 and GC) plays an important role in the prediction of coding potential.

Citing Articles

EnrichRBP: an automated and interpretable computational platform for predicting and analysing RNA-binding protein events.

Wang Y, Zhu H, Wang Y, Yang Y, Huang Y, Zhang J Bioinformatics. 2025; 41(1).

PMID: 39804669 PMC: 11783304. DOI: 10.1093/bioinformatics/btaf018.


Construction of a Dataset for All Expressed Transcripts for Alzheimer's Disease Research.

Huang Z, Shi B, Mu X, Qiao S, Xiao G, Wang Y Brain Sci. 2025; 14(12.

PMID: 39766379 PMC: 11674848. DOI: 10.3390/brainsci14121180.


Localization is the key to action: regulatory peculiarities of lncRNAs.

Poloni J, Oliveira F, Feltes B Front Genet. 2024; 15:1478352.

PMID: 39737005 PMC: 11683014. DOI: 10.3389/fgene.2024.1478352.


Iroquois homeobox 4 (IRX4) derived micropeptide promotes prostate cancer progression and chemoresistance through Wnt signalling dysregulation.

Fernando A, Liyanage C, Srinivasan S, Panchadsaram J, Rothnagel J, Clements J Commun Med (Lond). 2024; 4(1):224.

PMID: 39487222 PMC: 11530646. DOI: 10.1038/s43856-024-00613-9.


Full-length transcriptome assembly of black amur bream (Megalobrama terminalis) as a reference resource.

Liu K, Xie N Mol Biol Rep. 2024; 51(1):1101.

PMID: 39470845 DOI: 10.1007/s11033-024-10056-z.


References
1.
Zhao J, Song X, Wang K . lncScore: alignment-free identification of long noncoding RNA from assembled novel transcripts. Sci Rep. 2016; 6:34838. PMC: 5052565. DOI: 10.1038/srep34838. View

2.
Li A, Zhang J, Zhou Z . PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC Bioinformatics. 2014; 15:311. PMC: 4177586. DOI: 10.1186/1471-2105-15-311. View

3.
Zhu Y, Orre L, Johansson H, Huss M, Boekel J, Vesterlund M . Discovery of coding regions in the human genome by integrated proteogenomics analysis workflow. Nat Commun. 2018; 9(1):903. PMC: 5834625. DOI: 10.1038/s41467-018-03311-y. View

4.
Pian C, Zhang G, Chen Z, Chen Y, Zhang J, Yang T . LncRNApred: Classification of Long Non-Coding RNAs and Protein-Coding Transcripts by the Ensemble Algorithm with a New Hybrid Feature. PLoS One. 2016; 11(5):e0154567. PMC: 4882039. DOI: 10.1371/journal.pone.0154567. View

5.
Zhang X, Liu S . RBPPred: predicting RNA-binding proteins from sequence using SVM. Bioinformatics. 2016; 33(6):854-862. DOI: 10.1093/bioinformatics/btw730. View