Learnt Representations of Proteins Can Be Used for Accurate Prediction of Small Molecule Binding Sites on Experimentally Determined and Predicted Protein Structures

Overview

Journal J Cheminform

Publisher Biomed Central

Specialty Chemistry

Date 2024 Mar 15

PMID 38486231

Authors

Anna Carbery

Martin Buttenschoen

Rachael Skyner

Frank von Delft

Charlotte M Deane

Affiliations

Soon will be listed here.

Abstract

Protein-ligand binding site prediction is a useful tool for understanding the functional behaviour and potential drug-target interactions of a novel protein of interest. However, most binding site prediction methods are tested by providing crystallised ligand-bound (holo) structures as input. This testing regime is insufficient to understand the performance on novel protein targets where experimental structures are not available. An alternative option is to provide computationally predicted protein structures, but this is not commonly tested. However, due to the training data used, computationally-predicted protein structures tend to be extremely accurate, and are often biased toward a holo conformation. In this study we describe and benchmark IF-SitePred, a protein-ligand binding site prediction method which is based on the labelling of ESM-IF1 protein language model embeddings combined with point cloud annotation and clustering. We show that not only is IF-SitePred competitive with state-of-the-art methods when predicting binding sites on experimental structures, but it performs better on proxies for novel proteins where low accuracy has been simulated by molecular dynamics. Finally, IF-SitePred outperforms other methods if ensembles of predicted protein structures are generated.

Citing Articles

Comparative evaluation of methods for the prediction of protein-ligand binding sites.

Utges J, Barton G J Cheminform. 2024; 16(1):126.

PMID: 39529176 PMC: 11552181. DOI: 10.1186/s13321-024-00923-z.

HaloClass: Salt-Tolerant Protein Classification with Protein Language Models.

Narang K, Nath A, Hemstrom W, Chu S Protein J. 2024; 43(6):1035-1044.

PMID: 39432175 PMC: 11543744. DOI: 10.1007/s10930-024-10236-7.

References

Kozlovskii I, Popov P . Spatiotemporal identification of druggable binding sites using deep learning. Commun Biol. 2020; 3(1):618. PMC: 7591901. DOI: 10.1038/s42003-020-01350-0. View

Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W . Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023; 379(6637):1123-1130. DOI: 10.1126/science.ade2574. View

Diaz-Rovira A, Martin H, Beuming T, Diaz L, Guallar V, Ray S . Are Deep Learning Structural Models Sufficiently Accurate for Virtual Screening? Application of Docking Algorithms to AlphaFold2 Predicted Structures. J Chem Inf Model. 2023; 63(6):1668-1674. DOI: 10.1021/acs.jcim.2c01270. View

. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2022; 51(D1):D523-D531. PMC: 9825514. DOI: 10.1093/nar/gkac1052. View

Zhao J, Cao Y, Zhang L . Exploring the computational methods for protein-ligand binding site prediction. Comput Struct Biotechnol J. 2020; 18:417-426. PMC: 7049599. DOI: 10.1016/j.csbj.2020.02.008. View

Guilloux V, Schmidtke P, Tuffery P . Fpocket: an open source platform for ligand pocket detection. BMC Bioinformatics. 2009; 10:168. PMC: 2700099. DOI: 10.1186/1471-2105-10-168. View

Lamoree B, Hubbard R . Current perspectives in fragment-based lead discovery (FBLD). Essays Biochem. 2017; 61(5):453-464. PMC: 5869234. DOI: 10.1042/EBC20170028. View

Graef J, Ehrt C, Rarey M . Binding Site Detection Remastered: Enabling Fast, Robust, and Reliable Binding Site Detection and Descriptor Calculation with DoGSite3. J Chem Inf Model. 2023; 63(10):3128-3137. DOI: 10.1021/acs.jcim.3c00336. View

Littmann M, Heinzinger M, Dallago C, Weissenow K, Rost B . Protein embeddings and deep learning predict binding residues for various ligand classes. Sci Rep. 2021; 11(1):23916. PMC: 8668950. DOI: 10.1038/s41598-021-03431-4. View

10.

Stein R, Mchaourab H . SPEACH_AF: Sampling protein ensembles and conformational heterogeneity with Alphafold2. PLoS Comput Biol. 2022; 18(8):e1010483. PMC: 9436118. DOI: 10.1371/journal.pcbi.1010483. View

11.

Ozcelik R, van Tilborg D, Jimenez-Luna J, Grisoni F . Structure-Based Drug Discovery with Deep Learning. Chembiochem. 2023; 24(13):e202200776. DOI: 10.1002/cbic.202200776. View

12.

Gao J, Zhang Q, Liu M, Zhu L, Wu D, Cao Z . bSiteFinder, an improved protein-binding sites prediction server based on structural alignment: more accurate and less time-consuming. J Cheminform. 2016; 8:38. PMC: 4939519. DOI: 10.1186/s13321-016-0149-z. View

13.

Lee I, Nam H . Sequence-based prediction of protein binding regions and drug-target interactions. J Cheminform. 2022; 14(1):5. PMC: 8822694. DOI: 10.1186/s13321-022-00584-w. View

14.

Gordon J, Myers J, Folta T, Shoja V, Heath L, Onufriev A . H++: a server for estimating pKas and adding missing hydrogens to macromolecules. Nucleic Acids Res. 2005; 33(Web Server issue):W368-71. PMC: 1160225. DOI: 10.1093/nar/gki464. View

15.

Desaphy J, Azdimousa K, Kellenberger E, Rognan D . Comparison and druggability prediction of protein-ligand binding sites from pharmacophore-annotated cavity shapes. J Chem Inf Model. 2012; 52(8):2287-99. DOI: 10.1021/ci300184x. View

16.

Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee G . Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021; 373(6557):871-876. PMC: 7612213. DOI: 10.1126/science.abj8754. View

17.

Smith R, Clark J, Ahmed A, Orban Z, Dunbar Jr J, Carlson H . Updates to Binding MOAD (Mother of All Databases): Polypharmacology Tools and Their Utility in Drug Repurposing. J Mol Biol. 2019; 431(13):2423-2433. PMC: 6589129. DOI: 10.1016/j.jmb.2019.05.024. View

18.

Khazanov N, Carlson H . Exploring the composition of protein-ligand binding sites on a large scale. PLoS Comput Biol. 2013; 9(11):e1003321. PMC: 3836696. DOI: 10.1371/journal.pcbi.1003321. View

19.

Ngan C, Bohnuud T, Mottarella S, Beglov D, Villar E, Hall D . FTMAP: extended protein mapping with user-selected probe molecules. Nucleic Acids Res. 2012; 40(Web Server issue):W271-5. PMC: 3394268. DOI: 10.1093/nar/gks441. View

20.

Hu L, Benson M, Smith R, Lerner M, Carlson H . Binding MOAD (Mother Of All Databases). Proteins. 2005; 60(3):333-40. DOI: 10.1002/prot.20512. View