» Articles » PMID: 36456532

Exposing the Limitations of Molecular Machine Learning with Activity Cliffs

Overview
Authors
Affiliations
Soon will be listed here.
Abstract

Machine learning has become a crucial tool in drug discovery and chemistry at large, , to predict molecular properties, such as bioactivity, with high accuracy. However, activity cliffs─pairs of molecules that are highly similar in their structure but exhibit large differences in potency─have received limited attention for their effect on model performance. Not only are these edge cases informative for molecule discovery and optimization but also models that are well equipped to accurately predict the potency of activity cliffs have increased potential for prospective applications. Our work aims to fill the current knowledge gap on best-practice machine learning methods in the presence of activity cliffs. We benchmarked a total of 24 machine and deep learning approaches on curated bioactivity data from 30 macromolecular targets for their performance on activity cliff compounds. While all methods struggled in the presence of activity cliffs, machine learning approaches based on molecular descriptors outperformed more complex deep learning methods. Our findings highlight large case-by-case differences in performance, advocating for (a) the inclusion of dedicated "activity-cliff-centered" metrics during model development and evaluation and (b) the development of novel algorithms to better predict the properties of activity cliffs. To this end, the methods, metrics, and results of this study have been encapsulated into an open-access benchmarking platform named MoleculeACE (Activity Cliff Estimation, available on GitHub at: https://github.com/molML/MoleculeACE). MoleculeACE is designed to steer the community toward addressing the pressing but overlooked limitation of molecular machine learning models posed by activity cliffs.

Citing Articles

A data-driven generative strategy to avoid reward hacking in multi-objective molecular design.

Yoshizawa T, Ishida S, Sato T, Ohta M, Honma T, Terayama K Nat Commun. 2025; 16(1):2409.

PMID: 40069140 PMC: 11897179. DOI: 10.1038/s41467-025-57582-3.


A database for large-scale docking and experimental results.

Hall B, Tummino T, Tang K, Irwin J, Shoichet B bioRxiv. 2025; .

PMID: 40060496 PMC: 11888352. DOI: 10.1101/2025.02.25.639879.


Achieving well-informed decision-making in drug discovery: a comprehensive calibration study using neural network-based structure-activity models.

Friesacher H, Engkvist O, Mervin L, Moreau Y, Arany A J Cheminform. 2025; 17(1):29.

PMID: 40045403 PMC: 11881400. DOI: 10.1186/s13321-025-00964-y.


MultiCycPermea: accurate and interpretable prediction of cyclic peptide permeability using a multimodal image-sequence model.

Wang Z, Chen Y, Shang Y, Yang X, Pan W, Ye X BMC Biol. 2025; 23(1):63.

PMID: 40016695 PMC: 11866622. DOI: 10.1186/s12915-025-02166-2.


In Silico Approach for Antibacterial Discovery: PTML Modeling of Virtual Multi-Strain Inhibitors Against .

Kleandrova V, Cordeiro M, Speck-Planche A Pharmaceuticals (Basel). 2025; 18(2).

PMID: 40006010 PMC: 11858522. DOI: 10.3390/ph18020196.


References
1.
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S . PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 2018; 47(D1):D1102-D1109. PMC: 6324075. DOI: 10.1093/nar/gky1033. View

2.
Zhu H, Tropsha A, Fourches D, Varnek A, Papa E, Gramatica P . Combinatorial QSAR modeling of chemical toxicants tested against Tetrahymena pyriformis. J Chem Inf Model. 2008; 48(4):766-84. DOI: 10.1021/ci700443v. View

3.
Fourches D, Ash J . 4D- quantitative structure-activity relationship modeling: making a comeback. Expert Opin Drug Discov. 2019; 14(12):1227-1235. DOI: 10.1080/17460441.2019.1664467. View

4.
Schwaller P, Laino T, Gaudin T, Bolgar P, Hunter C, Bekas C . Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction. ACS Cent Sci. 2019; 5(9):1572-1583. PMC: 6764164. DOI: 10.1021/acscentsci.9b00576. View

5.
Dimova D, Heikamp K, Stumpfe D, Bajorath J . Do medicinal chemists learn from activity cliffs? A systematic evaluation of cliff progression in evolving compound data sets. J Med Chem. 2013; 56(8):3339-45. DOI: 10.1021/jm400147j. View