» Articles » PMID: 38003312

The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks

Overview
Journal Int J Mol Sci
Publisher MDPI
Date 2023 Nov 25
PMID 38003312
Authors
Affiliations
Soon will be listed here.
Abstract

Artificial intelligence (AI) has gained significant traction in the field of drug discovery, with deep learning (DL) algorithms playing a crucial role in predicting protein-ligand binding affinities. Despite advancements in neural network architectures, system representation, and training techniques, the performance of DL affinity prediction has reached a plateau, prompting the question of whether it is truly solved or if the current performance is overly optimistic and reliant on biased, easily predictable data. Like other DL-related problems, this issue seems to stem from the training and test sets used when building the models. In this work, we investigate the impact of several parameters related to the input data on the performance of neural network affinity prediction models. Notably, we identify the size of the binding pocket as a critical factor influencing the performance of our statistical models; furthermore, it is more important to train a model with as much data as possible than to restrict the training to only high-quality datasets. Finally, we also confirm the bias in the typically used current test sets. Therefore, several types of evaluation and benchmarking are required to understand models' decision-making processes and accurately compare the performance of models.

Citing Articles

Protein language models are performant in structure-free virtual screening.

Lam H, Guan J, Ong X, Pincket R, Mu Y Brief Bioinform. 2024; 25(6).

PMID: 39327890 PMC: 11427677. DOI: 10.1093/bib/bbae480.

References
1.
Ahmed A, Mam B, Sowdhamini R . DEELIG: A Deep Learning Approach to Predict Protein-Ligand Binding Affinity. Bioinform Biol Insights. 2021; 15:11779322211030364. PMC: 8274096. DOI: 10.1177/11779322211030364. View

2.
Da Silva F, Desaphy J, Rognan D . IChem: A Versatile Toolkit for Detecting, Comparing, and Predicting Protein-Ligand Interactions. ChemMedChem. 2017; 13(6):507-510. PMC: 5901026. DOI: 10.1002/cmdc.201700505. View

3.
Wu Z, Ramsundar B, Feinberg E, Gomes J, Geniesse C, Pappu A . MoleculeNet: a benchmark for molecular machine learning. Chem Sci. 2018; 9(2):513-530. PMC: 5868307. DOI: 10.1039/c7sc02664a. View

4.
Son J, Kim D . Development of a graph convolutional neural network model for efficient prediction of protein-ligand binding affinities. PLoS One. 2021; 16(4):e0249404. PMC: 8031450. DOI: 10.1371/journal.pone.0249404. View

5.
Chen P, Ke Y, Lu Y, Du Y, Li J, Yan H . DLIGAND2: an improved knowledge-based energy function for protein-ligand interactions using the distance-scaled, finite, ideal-gas reference state. J Cheminform. 2019; 11(1):52. PMC: 6686496. DOI: 10.1186/s13321-019-0373-4. View