» Articles » PMID: 18293306

Prediction of Protein Structural Class Using Novel Evolutionary Collocation-based Sequence Representation

Overview
Journal J Comput Chem
Publisher Wiley
Specialties Biology
Chemistry
Date 2008 Feb 23
PMID 18293306
Citations 36
Authors
Affiliations
Soon will be listed here.
Abstract

Knowledge of structural classes is useful in understanding of folding patterns in proteins. Although existing structural class prediction methods applied virtually all state-of-the-art classifiers, many of them use a relatively simple protein sequence representation that often includes amino acid (AA) composition. To this end, we propose a novel sequence representation that incorporates evolutionary information encoded using PSI-BLAST profile-based collocation of AA pairs. We used six benchmark datasets and five representative classifiers to quantify and compare the quality of the structural class prediction with the proposed representation. The best, classifier support vector machine achieved 61-96% accuracy on the six datasets. These predictions were comprehensively compared with a wide range of recently proposed methods for prediction of structural classes. Our comprehensive comparison shows superiority of the proposed representation, which results in error rate reductions that range between 14% and 26% when compared with predictions of the best-performing, previously published classifiers on the considered datasets. The study also shows that, for the benchmark dataset that includes sequences characterized by low identity (i.e., 25%, 30%, and 40%), the prediction accuracies are 20-35% lower than for the other three datasets that include sequences with a higher degree of similarity. In conclusion, the proposed representation is shown to substantially improve the accuracy of the structural class prediction. A web server that implements the presented prediction method is freely available at http://biomine.ece.ualberta.ca/Structural_Class/SCEC.html.

Citing Articles

A privacy-preserving approach for cloud-based protein fold recognition.

Unal A, Pfeifer N, Akgun M Patterns (N Y). 2024; 5(9):101023.

PMID: 39568647 PMC: 11573750. DOI: 10.1016/j.patter.2024.101023.


ResNetKhib: a novel cell type-specific tool for predicting lysine 2-hydroxyisobutylation sites via transfer learning.

Jia X, Zhao P, Li F, Qin Z, Ren H, Li J Brief Bioinform. 2023; 24(2).

PMID: 36880172 PMC: 10185920. DOI: 10.1093/bib/bbad063.


Fuzzy spherical truncation-based multi-linear protein descriptors: From their definition to application in structural-related predictions.

Contreras-Torres E, Marrero-Ponce Y, Teran J, Aguero-Chapin G, Antunes A, Garcia-Jacas C Front Chem. 2022; 10:959143.

PMID: 36277354 PMC: 9585278. DOI: 10.3389/fchem.2022.959143.


BBPpredict: A Web Service for Identifying Blood-Brain Barrier Penetrating Peptides.

Chen X, Zhang Q, Li B, Lu C, Yang S, Long J Front Genet. 2022; 13:845747.

PMID: 35656322 PMC: 9152268. DOI: 10.3389/fgene.2022.845747.


Using Recursive Feature Selection with Random Forest to Improve Protein Structural Class Prediction for Low-Similarity Sequences.

Wang Y, Xu Y, Yang Z, Liu X, Dai Q Comput Math Methods Med. 2021; 2021:5529389.

PMID: 34055035 PMC: 8123985. DOI: 10.1155/2021/5529389.