» Articles » PMID: 17631615

CPC: Assess the Protein-coding Potential of Transcripts Using Sequence Features and Support Vector Machine

Overview
Specialty Biochemistry
Date 2007 Jul 19
PMID 17631615
Citations 1431
Authors
Affiliations
Soon will be listed here.
Abstract

Recent transcriptome studies have revealed that a large number of transcripts in mammals and other organisms do not encode proteins but function as noncoding RNAs (ncRNAs) instead. As millions of transcripts are generated by large-scale cDNA and EST sequencing projects every year, there is a need for automatic methods to distinguish protein-coding RNAs from noncoding RNAs accurately and quickly. We developed a support vector machine-based classifier, named Coding Potential Calculator (CPC), to assess the protein-coding potential of a transcript based on six biologically meaningful sequence features. Tenfold cross-validation on the training dataset and further testing on several large datasets showed that CPC can discriminate coding from noncoding transcripts with high accuracy. Furthermore, CPC also runs an order-of-magnitude faster than a previous state-of-the-art tool and has higher accuracy. We developed a user-friendly web-based interface of CPC at http://cpc.cbi.pku.edu.cn. In addition to predicting the coding potential of the input transcripts, the CPC web server also graphically displays detailed sequence features and additional annotations of the transcript that may facilitate users' further investigation.

Citing Articles

LTR retrotransposon-derived novel lncRNA2 enhances cold tolerance in Moso bamboo by modulating antioxidant activity and photosynthetic efficiency.

Zhao J, Ding Y, Ramakrishnan M, Zou L, Chen Y, Zhou M PeerJ. 2025; 13:e19056.

PMID: 40028216 PMC: 11871892. DOI: 10.7717/peerj.19056.


Full-length transcriptome analysis of a bloom-forming dinoflagellate Scrippsiella acuminata (Dinophyceae).

Li F, Yue C, Deng Y, Tang Y Sci Data. 2025; 12(1):352.

PMID: 40016213 PMC: 11868372. DOI: 10.1038/s41597-025-04699-1.


Integrated Transcriptome and Proteome Analysis Provides New Insights into Starch and Sucrose Metabolism and Regulation of Corm Expansion Process in .

Zou C, He F, Li H, Liu L, Qiu Z, Dong W Biology (Basel). 2025; 14(2).

PMID: 40001941 PMC: 11851817. DOI: 10.3390/biology14020173.


Identification and Co-expression Analysis of Differentially Expressed LncRNAs and mRNAs Regulate Intramuscular Fat Deposition in Yaks at Two Developmental Stages.

Gao Z, Su Q, Raza S, Piras C, BinMowyna M, Al-Zahrani M Biochem Genet. 2025; .

PMID: 39971835 DOI: 10.1007/s10528-025-11046-x.


Simultaneous profiling of chromatin-associated RNA at targeted DNA loci and RNA-RNA Interactions through TaDRIM-seq.

Ding C, Chen G, Luan S, Gao R, Fan Y, Zhang Y Nat Commun. 2025; 16(1):1500.

PMID: 39929795 PMC: 11811046. DOI: 10.1038/s41467-024-53534-5.


References
1.
Brown M, Grundy W, Lin D, Cristianini N, Sugnet C, Furey T . Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci U S A. 2000; 97(1):262-7. PMC: 26651. DOI: 10.1073/pnas.97.1.262. View

2.
Frith M, Bailey T, Kasukawa T, Mignone F, Kummerfeld S, Madera M . Discrimination of non-protein-coding transcripts from protein-coding mRNA. RNA Biol. 2006; 3(1):40-8. DOI: 10.4161/rna.3.1.2789. View

3.
Hatzigeorgiou A, Fiziev P, Reczko M . DIANA-EST: a statistical analysis. Bioinformatics. 2001; 17(10):913-9. DOI: 10.1093/bioinformatics/17.10.913. View

4.
Eddy S . Non-coding RNA genes and the modern RNA world. Nat Rev Genet. 2001; 2(12):919-29. DOI: 10.1038/35103511. View

5.
Carninci P, Waki K, Shiraki T, Konno H, Shibata K, Itoh M . Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia. Genome Res. 2003; 13(6B):1273-89. PMC: 403712. DOI: 10.1101/gr.1119703. View