» Articles » PMID: 20236947

A Machine Learning Approach to Predicting Protein-ligand Binding Affinity with Applications to Molecular Docking

Overview
Journal Bioinformatics
Specialty Biology
Date 2010 Mar 19
PMID 20236947
Citations 255
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Accurately predicting the binding affinities of large sets of diverse protein-ligand complexes is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for analysing the outputs of molecular docking, which in turn is an important technique for drug discovery, chemical biology and structural biology. Each scoring function assumes a predetermined theory-inspired functional form for the relationship between the variables that characterize the complex, which also include parameters fitted to experimental or simulation data and its predicted binding affinity. The inherent problem of this rigid approach is that it leads to poor predictivity for those complexes that do not conform to the modelling assumptions. Moreover, resampling strategies, such as cross-validation or bootstrapping, are still not systematically used to guard against the overfitting of calibration data in parameter estimation for scoring functions.

Results: We propose a novel scoring function (RF-Score) that circumvents the need for problematic modelling assumptions via non-parametric machine learning. In particular, Random Forest was used to implicitly capture binding effects that are hard to model explicitly. RF-Score is compared with the state of the art on the demanding PDBbind benchmark. Results show that RF-Score is a very competitive scoring function. Importantly, RF-Score's performance was shown to improve dramatically with training set size and hence the future availability of more high-quality structural and interaction data is expected to lead to improved versions of RF-Score.

Contact: pedro.ballester@ebi.ac.uk; jbom@st-andrews.ac.uk

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

A database for large-scale docking and experimental results.

Hall B, Tummino T, Tang K, Irwin J, Shoichet B bioRxiv. 2025; .

PMID: 40060496 PMC: 11888352. DOI: 10.1101/2025.02.25.639879.


Structural bioinformatics for rational drug design.

Mozaffari S, Moen A, Ng C, Nicolaes G, Wichapong K Res Pract Thromb Haemost. 2025; 9(1):102691.

PMID: 40027444 PMC: 11869865. DOI: 10.1016/j.rpth.2025.102691.


Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data.

Valsson I, Warren M, Deane C, Magarkar A, Morris G, Biggin P Commun Chem. 2025; 8(1):41.

PMID: 39922899 PMC: 11807228. DOI: 10.1038/s42004-025-01428-y.


Benchmarking the robustness of the correct identification of flexible 3D objects using common machine learning models.

Zhang Y, Vitalis A Patterns (N Y). 2025; 6(1):101147.

PMID: 39896260 PMC: 11783895. DOI: 10.1016/j.patter.2024.101147.


Robustly interrogating machine learning-based scoring functions: what are they learning?.

Durant G, Boyles F, Birchall K, Marsden B, Deane C Bioinformatics. 2025; 41(2).

PMID: 39874452 PMC: 11821266. DOI: 10.1093/bioinformatics/btaf040.


References
1.
Mooij W, Verdonk M . General and targeted statistical potentials for protein-ligand interactions. Proteins. 2005; 61(2):272-87. DOI: 10.1002/prot.20588. View

2.
Friesner R, Murphy R, Repasky M, Frye L, Greenwood J, Halgren T . Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. J Med Chem. 2006; 49(21):6177-96. DOI: 10.1021/jm051256o. View

3.
Friesner R, Banks J, Murphy R, Halgren T, Klicic J, Mainz D . Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem. 2004; 47(7):1739-49. DOI: 10.1021/jm0306430. View

4.
Ferrara P, Gohlke H, Price D, Klebe G, Brooks 3rd C . Assessing scoring functions for protein-ligand interactions. J Med Chem. 2004; 47(12):3032-47. DOI: 10.1021/jm030489h. View

5.
Rucker C, Rucker G, Meringer M . y-Randomization and its variants in QSPR/QSAR. J Chem Inf Model. 2007; 47(6):2345-57. DOI: 10.1021/ci700157b. View