» Articles » PMID: 38784468

Systematic Generation and Analysis of Counterfactuals for Compound Activity Predictions Using Multi-task Models

Overview
Journal RSC Med Chem
Specialty Chemistry
Date 2024 May 24
PMID 38784468
Authors
Affiliations
Soon will be listed here.
Abstract

Most machine learning (ML) methods produce predictions that are hard or impossible to understand. The black box nature of predictive models obscures potential learning bias and makes it difficult to recognize and trace problems. Moreover, the inability to rationalize model decisions causes reluctance to accept predictions for experimental design. For ML, limited trust in predictions presents a substantial problem and continues to limit its impact in interdisciplinary research, including early-phase drug discovery. As a desirable remedy, approaches from explainable artificial intelligence (XAI) are increasingly applied to shed light on the ML black box and help to rationalize predictions. Among these is the concept of counterfactuals (CFs), which are best understood as test cases with small modifications yielding opposing prediction outcomes (such as different class labels in object classification). For ML applications in medicinal chemistry, for example, compound activity predictions, CFs are particularly intuitive because these hypothetical molecules enable immediate comparisons with actual test compounds that do not require expert ML knowledge and are accessible to practicing chemists. Such comparisons often reveal structural moieties in compounds that determine their predictions and can be further investigated. Herein, we adapt and extend a recently introduced concept for the systematic generation of molecular CFs to multi-task predictions of different classes of protein kinase inhibitors, analyze CFs in detail, rationalize the origins of CF formation in multi-task modeling, and present exemplary explanations of predictions.

Citing Articles

SIGMAP: an explainable artificial intelligence tool for SIGMA-1 receptor affinity prediction.

Lomuscio M, Corriero N, Nanna V, Piccinno A, Saviano M, Lanzilotti R RSC Med Chem. 2024; 16(2):835-848.

PMID: 39618965 PMC: 11605305. DOI: 10.1039/d4md00722k.

References
1.
Baell J, Holloway G . New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J Med Chem. 2010; 53(7):2719-40. DOI: 10.1021/jm901137j. View

2.
Castelvecchi D . Can we open the black box of AI?. Nature. 2016; 538(7623):20-23. DOI: 10.1038/538020a. View

3.
Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G . Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. 2019; 18(6):463-477. PMC: 6552674. DOI: 10.1038/s41573-019-0024-5. View

4.
Byrne R . Counterfactual Thought. Annu Rev Psychol. 2015; 67:135-57. DOI: 10.1146/annurev-psych-122414-033249. View

5.
Bento A, Gaulton A, Hersey A, Bellis L, Chambers J, Davies M . The ChEMBL bioactivity database: an update. Nucleic Acids Res. 2013; 42(Database issue):D1083-90. PMC: 3965067. DOI: 10.1093/nar/gkt1031. View