Interpretation of Machine Learning Models Using Shapley Values: Application to Compound Potency and Multi-target Activity Predictions

Overview

Journal J Comput Aided Mol Des

Publisher Springer

Specialties Biomedical Engineering
Molecular Biology

Date 2020 May 4

PMID 32361862

Citations 104

Authors

Raquel Rodriguez-Perez

Jurgen Bajorath

Affiliations

Soon will be listed here.

Abstract

Difficulties in interpreting machine learning (ML) models and their predictions limit the practical applicability of and confidence in ML in pharmaceutical research. There is a need for agnostic approaches aiding in the interpretation of ML models regardless of their complexity that is also applicable to deep neural network (DNN) architectures and model ensembles. To these ends, the SHapley Additive exPlanations (SHAP) methodology has recently been introduced. The SHAP approach enables the identification and prioritization of features that determine compound classification and activity prediction using any ML model. Herein, we further extend the evaluation of the SHAP methodology by investigating a variant for exact calculation of Shapley values for decision tree methods and systematically compare this variant in compound activity and potency value predictions with the model-independent SHAP method. Moreover, new applications of the SHAP analysis approach are presented including interpretation of DNN models for the generation of multi-target activity profiles and ensemble regression models for potency prediction.

Citing Articles

KinasePred: A Computational Tool for Small-Molecule Kinase Target Prediction.

Di Stefano M, Piazza L, Poles C, Galati S, Granchi C, Giordano A Int J Mol Sci. 2025; 26(5).

PMID: 40076779 PMC: 11900317. DOI: 10.3390/ijms26052157.

Prediction of contrast-associated acute kidney injury with machine-learning in patients undergoing contrast-enhanced computed tomography in emergency department.

Lee K, Jung W, Jeon J, Chang H, Lee J, Huh W Sci Rep. 2025; 15(1):7088.

PMID: 40016350 PMC: 11868533. DOI: 10.1038/s41598-025-86933-9.

Development and validation of a machine learning approach for screening new leprosy cases based on the leprosy suspicion questionnaire.

Mendonca Ramos Simoes M, Rocha Lima F, Barbosa Lugao H, de Paula N, Lincoln Silva C, Ramos A Sci Rep. 2025; 15(1):6912.

PMID: 40011614 PMC: 11865526. DOI: 10.1038/s41598-025-91462-6.

Explanatory AI Predicts the Diet Adopted Based on Nutritional and Lifestyle Habits in the Spanish Population.

Sandri E, Cerda Olmedo G, Piredda M, Werner L, Dentamaro V Eur J Investig Health Psychol Educ. 2025; 15(2).

PMID: 39997075 PMC: 11854735. DOI: 10.3390/ejihpe15020011.

Writing the Signs: An Explainable Machine Learning Approach for Alzheimer's Disease Classification from Handwriting.

Ho N, Gonzalez P, Gogovi G Healthc Technol Lett. 2025; 12(1):e70006.

PMID: 39949642 PMC: 11822997. DOI: 10.1049/htl2.70006.

References

Lundberg S, Erion G, Chen H, DeGrave A, Prutkin J, Nair B . From Local Explanations to Global Understanding with Explainable AI for Trees. Nat Mach Intell. 2020; 2(1):56-67. PMC: 7326367. DOI: 10.1038/s42256-019-0138-9. View

Dimova D, Bajorath J . Assessing Scaffold Diversity of Kinase Inhibitors Using Alternative Scaffold Concepts and Estimating the Scaffold Hopping Potential for Different Kinases. Molecules. 2017; 22(5). PMC: 6154288. DOI: 10.3390/molecules22050730. View

Rodriguez-Perez R, Vogt M, Bajorath J . Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction. ACS Omega. 2018; 2(10):6371-6379. PMC: 6045367. DOI: 10.1021/acsomega.7b01079. View

Polishchuk P . Interpretation of Quantitative Structure-Activity Relationship Models: Past, Present, and Future. J Chem Inf Model. 2017; 57(11):2618-2639. DOI: 10.1021/acs.jcim.7b00274. View

Baskin I, Ait A, Halberstam N, Palyulin V, Zefirov N . An approach to the interpretation of backpropagation neural network models in QSAR studies. SAR QSAR Environ Res. 2002; 13(1):35-41. DOI: 10.1080/10629360290002073. View

Rodriguez-Perez R, Bajorath J . Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values. J Med Chem. 2019; 63(16):8761-8777. DOI: 10.1021/acs.jmedchem.9b01101. View

Johansson U, Sonstrod C, Norinder U, Bostrom H . Trade-off between accuracy and interpretability for predictive in silico modeling. Future Med Chem. 2011; 3(6):647-63. DOI: 10.4155/fmc.11.23. View

Varnek A, Baskin I . Machine learning methods for property prediction in chemoinformatics: Quo Vadis?. J Chem Inf Model. 2012; 52(6):1413-37. DOI: 10.1021/ci200409x. View

Stumpfe D, Dimova D, Bajorath J . Computational Method for the Systematic Identification of Analog Series and Key Compounds Representing Series and Their Biological Activity Profiles. J Med Chem. 2016; 59(16):7667-76. DOI: 10.1021/acs.jmedchem.6b00906. View

10.

Lavecchia A . Machine-learning approaches in drug discovery: methods and applications. Drug Discov Today. 2014; 20(3):318-31. DOI: 10.1016/j.drudis.2014.10.012. View

11.

Matthews B . Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta. 1975; 405(2):442-51. DOI: 10.1016/0005-2795(75)90109-9. View

12.

Lo Y, Rensi S, Torng W, Altman R . Machine learning in chemoinformatics and drug discovery. Drug Discov Today. 2018; 23(8):1538-1546. PMC: 6078794. DOI: 10.1016/j.drudis.2018.05.010. View

13.

So S, Richards W . Application of neural networks: quantitative structure-activity relationships of the derivatives of 2,4-diamino-5-(substituted-benzyl)pyrimidines as DHFR inhibitors. J Med Chem. 1992; 35(17):3201-7. DOI: 10.1021/jm00095a016. View

14.

Gaulton A, Bellis L, Bento A, Chambers J, Davies M, Hersey A . ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2011; 40(Database issue):D1100-7. PMC: 3245175. DOI: 10.1093/nar/gkr777. View

15.

Hansen K, Baehrens D, Schroeter T, Rupp M, Muller K . Visual Interpretation of Kernel-Based Prediction Models. Mol Inform. 2016; 30(9):817-26. DOI: 10.1002/minf.201100059. View

16.

Cherkasov A, Muratov E, Fourches D, Varnek A, Baskin I, Cronin M . QSAR modeling: where have you been? Where are you going to?. J Med Chem. 2013; 57(12):4977-5010. PMC: 4074254. DOI: 10.1021/jm4004285. View

17.

Rogers D, Hahn M . Extended-connectivity fingerprints. J Chem Inf Model. 2010; 50(5):742-54. DOI: 10.1021/ci100050t. View

18.

Sterling T, Irwin J . ZINC 15--Ligand Discovery for Everyone. J Chem Inf Model. 2015; 55(11):2324-37. PMC: 4658288. DOI: 10.1021/acs.jcim.5b00559. View

19.

Balfer J, Bajorath J . Introduction of a methodology for visualization and graphical interpretation of Bayesian classification models. J Chem Inf Model. 2014; 54(9):2451-68. DOI: 10.1021/ci500410g. View

20.

Balfer J, Bajorath J . Visualization and Interpretation of Support Vector Machine Activity Predictions. J Chem Inf Model. 2015; 55(6):1136-47. DOI: 10.1021/acs.jcim.5b00175. View