» Articles » PMID: 38486231

Learnt Representations of Proteins Can Be Used for Accurate Prediction of Small Molecule Binding Sites on Experimentally Determined and Predicted Protein Structures

Overview
Journal J Cheminform
Publisher Biomed Central
Specialty Chemistry
Date 2024 Mar 15
PMID 38486231
Authors
Affiliations
Soon will be listed here.
Abstract

Protein-ligand binding site prediction is a useful tool for understanding the functional behaviour and potential drug-target interactions of a novel protein of interest. However, most binding site prediction methods are tested by providing crystallised ligand-bound (holo) structures as input. This testing regime is insufficient to understand the performance on novel protein targets where experimental structures are not available. An alternative option is to provide computationally predicted protein structures, but this is not commonly tested. However, due to the training data used, computationally-predicted protein structures tend to be extremely accurate, and are often biased toward a holo conformation. In this study we describe and benchmark IF-SitePred, a protein-ligand binding site prediction method which is based on the labelling of ESM-IF1 protein language model embeddings combined with point cloud annotation and clustering. We show that not only is IF-SitePred competitive with state-of-the-art methods when predicting binding sites on experimental structures, but it performs better on proxies for novel proteins where low accuracy has been simulated by molecular dynamics. Finally, IF-SitePred outperforms other methods if ensembles of predicted protein structures are generated.

Citing Articles

Comparative evaluation of methods for the prediction of protein-ligand binding sites.

Utges J, Barton G J Cheminform. 2024; 16(1):126.

PMID: 39529176 PMC: 11552181. DOI: 10.1186/s13321-024-00923-z.


HaloClass: Salt-Tolerant Protein Classification with Protein Language Models.

Narang K, Nath A, Hemstrom W, Chu S Protein J. 2024; 43(6):1035-1044.

PMID: 39432175 PMC: 11543744. DOI: 10.1007/s10930-024-10236-7.

References
1.
Kozlovskii I, Popov P . Spatiotemporal identification of druggable binding sites using deep learning. Commun Biol. 2020; 3(1):618. PMC: 7591901. DOI: 10.1038/s42003-020-01350-0. View

2.
Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W . Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023; 379(6637):1123-1130. DOI: 10.1126/science.ade2574. View

3.
Diaz-Rovira A, Martin H, Beuming T, Diaz L, Guallar V, Ray S . Are Deep Learning Structural Models Sufficiently Accurate for Virtual Screening? Application of Docking Algorithms to AlphaFold2 Predicted Structures. J Chem Inf Model. 2023; 63(6):1668-1674. DOI: 10.1021/acs.jcim.2c01270. View

4.
. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2022; 51(D1):D523-D531. PMC: 9825514. DOI: 10.1093/nar/gkac1052. View

5.
Zhao J, Cao Y, Zhang L . Exploring the computational methods for protein-ligand binding site prediction. Comput Struct Biotechnol J. 2020; 18:417-426. PMC: 7049599. DOI: 10.1016/j.csbj.2020.02.008. View