» Articles » PMID: 34527196

Interpreting a Black Box Predictor to Gain Insights into Early Folding Mechanisms

Overview
Specialty Biotechnology
Date 2021 Sep 16
PMID 34527196
Citations 1
Authors
Affiliations
Soon will be listed here.
Abstract

Protein folding and function are closely connected, but the exact mechanisms by which proteins fold remain elusive. Early folding residues (EFRs) are amino acids within a particular protein that induce the very first stages of the folding process. High-resolution EFR data are only available for few proteins, which has previously enabled the training of a protein sequence-based machine learning 'black box' predictor (EFoldMine). Such a black box approach does not allow a direct extraction of the 'early folding rules' embedded in the protein sequence, whilst such interpretation is essential to improve our understanding of how the folding process works. We here apply and investigate a novel 'grey box' approach to the prediction of EFRs from protein sequence to gain mechanistic residue-level insights into the sequence determinants of EFRs in proteins. We interpret the rule set for three datasets, a default set comprised of natural proteins, a scrambled set comprised of the scrambled default set sequences, and a set of designed proteins. Finally, we relate these data to the secondary structure adopted in the folded protein and provide all information online via http://xefoldmine.bio2byte.be/, as a resource to help understand and steer early protein folding.

Citing Articles

Recent Advances in Protein Folding Pathway Prediction through Computational Methods.

Zhao K, Liang F, Xia Y, Hou M, Zhang G Curr Med Chem. 2023; 31(26):4111-4126.

PMID: 37828669 DOI: 10.2174/0109298673265249231004193520.

References
1.
Raimondi D, Orlando G, Pancsa R, Khan T, Vranken W . Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins. Sci Rep. 2017; 7(1):8826. PMC: 5562875. DOI: 10.1038/s41598-017-08366-3. View

2.
Bryngelson J, Wolynes P . Spin glasses and the statistical mechanics of protein folding. Proc Natl Acad Sci U S A. 1987; 84(21):7524-8. PMC: 299331. DOI: 10.1073/pnas.84.21.7524. View

3.
Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H . The Protein Data Bank. Nucleic Acids Res. 1999; 28(1):235-42. PMC: 102472. DOI: 10.1093/nar/28.1.235. View

4.
Burley S, Bhikadiya C, Bi C, Bittrich S, Chen L, Crichlow G . RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res. 2020; 49(D1):D437-D451. PMC: 7779003. DOI: 10.1093/nar/gkaa1038. View

5.
Velankar S, van Ginkel G, Alhroub Y, Battle G, Berrisford J, Conroy M . PDBe: improved accessibility of macromolecular structure data from PDB and EMDB. Nucleic Acids Res. 2015; 44(D1):D385-95. PMC: 4702783. DOI: 10.1093/nar/gkv1047. View