» Articles » PMID: 28421198

Identify High-Quality Protein Structural Models by Enhanced -Means

Overview
Journal Biomed Res Int
Publisher Wiley
Date 2017 Apr 20
PMID 28421198
Citations 4
Authors
Affiliations
Soon will be listed here.
Abstract

One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Here, we proposed two enhanced -means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic -means clustering (-means), whereas the other employs squared distance to optimize the initial centroids (-means++). Our results showed that -means and -means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. We observed that the classic -means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both -means and -means++ demonstrated substantial improvements relative to results from SPICKER and classical -means.

Citing Articles

Research on RNA secondary structure predicting via bidirectional recurrent neural network.

Lu W, Cao Y, Wu H, Ding Y, Song Z, Zhang Y BMC Bioinformatics. 2021; 22(Suppl 3):431.

PMID: 34496763 PMC: 8427827. DOI: 10.1186/s12859-021-04332-z.


Research on predicting 2D-HP protein folding using reinforcement learning with full state space.

Wu H, Yang R, Fu Q, Chen J, Lu W, Li H BMC Bioinformatics. 2019; 20(Suppl 25):685.

PMID: 31874607 PMC: 6929271. DOI: 10.1186/s12859-019-3259-6.


Ranking near-native candidate protein structures via random forest classification.

Wu H, Huang H, Lu W, Fu Q, Ding Y, Qiu J BMC Bioinformatics. 2019; 20(Suppl 25):683.

PMID: 31874596 PMC: 6929337. DOI: 10.1186/s12859-019-3257-8.


Machine learning for epigenetics and future medical applications.

Holder L, Haque M, Skinner M Epigenetics. 2017; 12(7):505-514.

PMID: 28524769 PMC: 5687335. DOI: 10.1080/15592294.2017.1329068.

References
1.
Baker D, Sali A . Protein structure prediction and structural genomics. Science. 2001; 294(5540):93-6. DOI: 10.1126/science.1065659. View

2.
Siew N, Elofsson A, Rychlewski L, Fischer D . MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics. 2000; 16(9):776-85. DOI: 10.1093/bioinformatics/16.9.776. View

3.
Dehzangi A, Paliwal K, Lyons J, Sharma A, Sattar A . Proposing a highly accurate protein structural class predictor using segmentation-based features. BMC Genomics. 2014; 15 Suppl 1:S2. PMC: 4046757. DOI: 10.1186/1471-2164-15-S1-S2. View

4.
Levitt M, Gerstein M . A unified statistical framework for sequence comparison and structure comparison. Proc Natl Acad Sci U S A. 1998; 95(11):5913-20. PMC: 34495. DOI: 10.1073/pnas.95.11.5913. View

5.
Wu H, Wang K, Lu L, Xue Y, Lyu Q, Jiang M . Deep Conditional Random Field Approach to Transmembrane Topology Prediction and Application to GPCR Three-Dimensional Structure Modeling. IEEE/ACM Trans Comput Biol Bioinform. 2016; 14(5):1106-1114. DOI: 10.1109/TCBB.2016.2602872. View