Critical Assessment of High-throughput Standalone Methods for Secondary Structure Prediction

Overview

Journal Brief Bioinform

Publisher Oxford University Press

Specialty Biology

Date 2011 Jan 22

PMID 21252072

Citations 22

Authors

Hua Zhang

Tuo Zhang

Ke Chen

Kanaka Durga Kedarisetti

Marcin J Mizianty

Qingbo Bao

Wojciech Stach

Lukasz Kurgan

Affiliations

Soon will be listed here.

Abstract

Sequence-based prediction of protein secondary structure (SS) enjoys wide-spread and increasing use for the analysis and prediction of numerous structural and functional characteristics of proteins. The lack of a recent comprehensive and large-scale comparison of the numerous prediction methods results in an often arbitrary selection of a SS predictor. To address this void, we compare and analyze 12 popular, standalone and high-throughput predictors on a large set of 1975 proteins to provide in-depth, novel and practical insights. We show that there is no universally best predictor and thus detailed comparative studies are needed to support informed selection of SS predictors for a given application. Our study shows that the three-state accuracy (Q3) and segment overlap (SOV3) of the SS prediction currently reach 82% and 81%, respectively. We demonstrate that carefully designed consensus-based predictors improve the Q3 by additional 2% and that homology modeling-based methods are significantly better by 1.5% Q3 than ab initio approaches. Our empirical analysis reveals that solvent exposed and flexible coils are predicted with a higher quality than the buried and rigid coils, while inverse is true for the strands and helices. We also show that longer helices are easier to predict, which is in contrast to longer strands that are harder to find. The current methods confuse 1-6% of strand residues with helical residues and vice versa and they perform poorly for residues in the β- bridge and 3(10)-helix conformations. Finally, we compare predictions of the standalone implementations of four well-performing methods with their corresponding web servers.

Citing Articles

DescribePROT Database of Residue-Level Protein Structure and Function Annotations.

Zhao B, Basu S, Kurgan L Methods Mol Biol. 2024; 2867:169-184.

PMID: 39576581 DOI: 10.1007/978-1-0716-4196-5_10.

Recent Advances in Computational Prediction of Secondary and Supersecondary Structures from Protein Sequences.

Zhang J, Qian J, Zou Q, Zhou F, Kurgan L Methods Mol Biol. 2024; 2870:1-19.

PMID: 39543027 DOI: 10.1007/978-1-0716-4213-9_1.

Taxonomy-specific assessment of intrinsic disorder predictions at residue and region levels in higher eukaryotes, protists, archaea, bacteria and viruses.

Basu S, Kurgan L Comput Struct Biotechnol J. 2024; 23:1968-1977.

PMID: 38765610 PMC: 11098722. DOI: 10.1016/j.csbj.2024.04.059.

Availability of web servers significantly boosts citations rates of bioinformatics methods for protein function and disorder prediction.

Song J, Kurgan L Bioinform Adv. 2023; 3(1):vbad184.

PMID: 38146538 PMC: 10749743. DOI: 10.1093/bioadv/vbad184.

The N-terminal intrinsically disordered region of Ncb5or docks with the cytochrome b core to form a helical motif that is of ancient origin.

Benson D, Deng B, Kashipathy M, Lovell S, Battaile K, Cooper A Proteins. 2023; 92(4):554-566.

PMID: 38041394 PMC: 10932899. DOI: 10.1002/prot.26647.