A Machine Learning Approach Towards the Prediction of Protein-ligand Binding Affinity Based on Fundamental Molecular Properties

Overview

Journal RSC Adv

Publisher Royal Society of Chemistry

Specialty Chemistry

Date 2022 May 11

PMID 35539386

Authors

Indra Kundu

Goutam Paul

Raja Banerjee

Affiliations

Soon will be listed here.

Abstract

There is an exigency of transformation of the enormous amount of biological data available in various forms into some significant knowledge. We have tried to implement Machine Learning (ML) algorithm models on the protein-ligand binding affinity data already available to predict the binding affinity of the unknown. ML methods are appreciably faster and cheaper as compared to traditional experimental methods or computational scoring approaches. The prerequisites of this prediction are sufficient and unbiased features of training data and a prediction model which can fit the data well. In our study, we have applied Random forest and Gaussian process regression algorithms from the Weka package on protein-ligand binding affinity, which encompasses protein and ligand binding information from PdbBind database. The models are trained on the basis of selective fundamental information of both proteins and ligand, which can be effortlessly fetched from online databases or can be calculated with the availability of structure. The assessment of the models was made on the basis of correlation coefficient ( ) and root mean square error (RMSE). The Random forest model gave and RMSE of 0.76 and 1.31 respectively. We have also used our features and prediction models on the dataset used by others and found that our model with our features outperformed the existing ones.

Citing Articles

Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data.

Valsson I, Warren M, Deane C, Magarkar A, Morris G, Biggin P Commun Chem. 2025; 8(1):41.

PMID: 39922899 PMC: 11807228. DOI: 10.1038/s42004-025-01428-y.

Advances in Protein-Ligand Binding Affinity Prediction via Deep Learning: A Comprehensive Study of Datasets, Data Preprocessing Techniques, and Model Architectures.

Abdelkader G, Kim J Curr Drug Targets. 2024; 25(15):1041-1065.

PMID: 39318214 PMC: 11774311. DOI: 10.2174/0113894501330963240905083020.

DOCK-PET: database of CNS kinetic parameters in the healthy human brain for existing PET tracers.

Miyajima I, Yoshikawa A, Sahashi K, Seki C, Nagai Y, Watabe H Ann Nucl Med. 2024; 38(8):666-672.

PMID: 38814564 DOI: 10.1007/s12149-024-01947-z.

Drug Repurposing: Insights into Current Advances and Future Applications.

Bhatia T, Sharma S Curr Med Chem. 2023; 32(3):468-510.

PMID: 37946344 DOI: 10.2174/0109298673266470231023110841.

In-Silico Approaches for the Screening and Discovery of Broad-Spectrum Marine Natural Product Antiviral Agents Against Coronaviruses.

Boswell Z, Verga J, Mackle J, Guerrero-Vazquez K, Thomas O, Cray J Infect Drug Resist. 2023; 16:2321-2338.

PMID: 37155475 PMC: 10122865. DOI: 10.2147/IDR.S395203.

References

Kim S, Thiessen P, Bolton E, Chen J, Fu G, Gindulyte A . PubChem Substance and Compound databases. Nucleic Acids Res. 2015; 44(D1):D1202-13. PMC: 4702940. DOI: 10.1093/nar/gkv951. View

Anderson A . The process of structure-based drug design. Chem Biol. 2003; 10(9):787-97. DOI: 10.1016/j.chembiol.2003.09.002. View

Deng W, Breneman C, Embrechts M . Predicting protein-ligand binding affinities using novel geometrical descriptors and machine-learning methods. J Chem Inf Comput Sci. 2004; 44(2):699-703. DOI: 10.1021/ci034246+. View

Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H . The Protein Data Bank. Nucleic Acids Res. 1999; 28(1):235-42. PMC: 102472. DOI: 10.1093/nar/28.1.235. View

Wang R, Fang X, Lu Y, Yang C, Wang S . The PDBbind database: methodologies and updates. J Med Chem. 2005; 48(12):4111-9. DOI: 10.1021/jm048957q. View

ANFINSEN C . Principles that govern the folding of protein chains. Science. 1973; 181(4096):223-30. DOI: 10.1126/science.181.4096.223. View

Lipinski C, Lombardo F, Dominy B, Feeney P . Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev. 2001; 46(1-3):3-26. DOI: 10.1016/s0169-409x(00)00129-0. View

Frauenfelder H, Sligar S, Wolynes P . The energy landscapes and motions of proteins. Science. 1991; 254(5038):1598-603. DOI: 10.1126/science.1749933. View

Jacob L, Vert J . Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics. 2008; 24(19):2149-56. PMC: 2553441. DOI: 10.1093/bioinformatics/btn409. View

10.

Yarimizu M, Wei C, Komiyama Y, Ueki K, Nakamura S, Sumikoshi K . Tyrosine Kinase Ligand-Receptor Pair Prediction by Using Support Vector Machine. Adv Bioinformatics. 2015; 2015:528097. PMC: 4548105. DOI: 10.1155/2015/528097. View

11.

Yap C . PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2011; 32(7):1466-74. DOI: 10.1002/jcc.21707. View

12.

Amzel L . Structure-based drug design. Curr Opin Biotechnol. 1998; 9(4):366-9. DOI: 10.1016/s0958-1669(98)80009-8. View

13.

Lipinski C . Lead- and drug-like compounds: the rule-of-five revolution. Drug Discov Today Technol. 2014; 1(4):337-41. DOI: 10.1016/j.ddtec.2004.11.007. View

14.

Wang Y, Guo Y, Kuang Q, Pu X, Ji Y, Zhang Z . A comparative study of family-specific protein-ligand complex affinity prediction based on random forest approach. J Comput Aided Mol Des. 2014; 29(4):349-60. DOI: 10.1007/s10822-014-9827-y. View

15.

Mysinger M, Carchia M, Irwin J, Shoichet B . Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J Med Chem. 2012; 55(14):6582-94. PMC: 3405771. DOI: 10.1021/jm300687e. View

16.

Manly C, Louise-May S, Hammer J . The impact of informatics and computational chemistry on synthesis and screening. Drug Discov Today. 2001; 6(21):1101-1110. DOI: 10.1016/s1359-6446(01)01990-0. View

17.

Halperin I, Ma B, Wolfson H, Nussinov R . Principles of docking: An overview of search algorithms and a guide to scoring functions. Proteins. 2002; 47(4):409-43. DOI: 10.1002/prot.10115. View

18.

Chen X, Liu M . Prediction of protein-protein interactions using random decision forest framework. Bioinformatics. 2005; 21(24):4394-400. DOI: 10.1093/bioinformatics/bti721. View

19.

Yuan Z, Burrage K, Mattick J . Prediction of protein solvent accessibility using support vector machines. Proteins. 2002; 48(3):566-70. DOI: 10.1002/prot.10176. View

20.

Dubchak I, Muchnik I, Holbrook S, Kim S . Prediction of protein folding class using global description of amino acid sequence. Proc Natl Acad Sci U S A. 1995; 92(19):8700-4. PMC: 41034. DOI: 10.1073/pnas.92.19.8700. View