» Articles » PMID: 37713443

Incorporating Physics to Overcome Data Scarcity in Predictive Modeling of Protein Function: A Case Study of BK Channels

Overview
Specialty Biology
Date 2023 Sep 15
PMID 37713443
Authors
Affiliations
Soon will be listed here.
Abstract

Machine learning has played transformative roles in numerous chemical and biophysical problems such as protein folding where large amount of data exists. Nonetheless, many important problems remain challenging for data-driven machine learning approaches due to the limitation of data scarcity. One approach to overcome data scarcity is to incorporate physical principles such as through molecular modeling and simulation. Here, we focus on the big potassium (BK) channels that play important roles in cardiovascular and neural systems. Many mutants of BK channel are associated with various neurological and cardiovascular diseases, but the molecular effects are unknown. The voltage gating properties of BK channels have been characterized for 473 site-specific mutations experimentally over the last three decades; yet, these functional data by themselves remain far too sparse to derive a predictive model of BK channel voltage gating. Using physics-based modeling, we quantify the energetic effects of all single mutations on both open and closed states of the channel. Together with dynamic properties derived from atomistic simulations, these physical descriptors allow the training of random forest models that could reproduce unseen experimentally measured shifts in gating voltage, ∆V1/2, with a RMSE ~ 32 mV and correlation coefficient of R ~ 0.7. Importantly, the model appears capable of uncovering nontrivial physical principles underlying the gating of the channel, including a central role of hydrophobic gating. The model was further evaluated using four novel mutations of L235 and V236 on the S5 helix, mutations of which are predicted to have opposing effects on V1/2 and suggest a key role of S5 in mediating voltage sensor-pore coupling. The measured ∆V1/2 agree quantitatively with prediction for all four mutations, with a high correlation of R = 0.92 and RMSE = 18 mV. Therefore, the model can capture nontrivial voltage gating properties in regions where few mutations are known. The success of predictive modeling of BK voltage gating demonstrates the potential of combining physics and statistical learning for overcoming data scarcity in nontrivial protein function prediction.

Citing Articles

AAindexNC: Estimating the Physicochemical Properties of Non-Canonical Amino Acids, Including Those Derived from the PDB and PDBeChem Databank.

Milchevskiy Y, Kravatskaya G, Kravatsky Y Int J Mol Sci. 2024; 25(23).

PMID: 39684267 PMC: 11641631. DOI: 10.3390/ijms252312555.


Biophysics-based protein language models for protein engineering.

Gelman S, Johnson B, Freschlin C, Sharma A, DCosta S, Peters J bioRxiv. 2024; .

PMID: 38559182 PMC: 10980077. DOI: 10.1101/2024.03.15.585128.

References
1.
Van Durme J, Maurer-Stroh S, Gallardo R, Wilkinson H, Rousseau F, Schymkowitz J . Accurate prediction of DnaK-peptide binding via homology modelling and experimental data. PLoS Comput Biol. 2009; 5(8):e1000475. PMC: 2717214. DOI: 10.1371/journal.pcbi.1000475. View

2.
Gu R, de Groot B . Central cavity dehydration as a gating mechanism of potassium channels. Nat Commun. 2023; 14(1):2178. PMC: 10110622. DOI: 10.1038/s41467-023-37531-8. View

3.
Antes I, Siu S, Lengauer T . DynaPred: a structure and sequence based method for the prediction of MHC class I binding peptide sequences and conformations. Bioinformatics. 2006; 22(14):e16-24. DOI: 10.1093/bioinformatics/btl216. View

4.
Bazard P, Frisina R, Acosta A, Dasgupta S, Bauer M, Zhu X . Roles of Key Ion Channels and Transport Proteins in Age-Related Hearing Loss. Int J Mol Sci. 2021; 22(11). PMC: 8201059. DOI: 10.3390/ijms22116158. View

5.
Hie B, Yang K . Adaptive machine learning for protein engineering. Curr Opin Struct Biol. 2021; 72:145-152. DOI: 10.1016/j.sbi.2021.11.002. View