Adding Stochastic Negative Examples into Machine Learning Improves Molecular Bioactivity Prediction

Overview

Journal J Chem Inf Model

Publisher American Chemical Society

Specialties Chemistry
Medical Informatics

Date 2020 Nov 27

PMID 33245237

Citations 3

Authors

Elena L Caceres

Nicholas C Mew

Michael J Keiser

Affiliations

Soon will be listed here.

Abstract

Multitask deep neural networks learn to predict ligand-target binding by example, yet public pharmacological data sets are sparse, imbalanced, and approximate. We constructed two hold-out benchmarks to approximate temporal and drug-screening test scenarios, whose characteristics differ from a random split of conventional training data sets. We developed a pharmacological data set augmentation procedure, Stochastic Negative Addition (SNA), which randomly assigns untested molecule-target pairs as transient negative examples during training. Under the procedure, drug-screening benchmark performance increases from = 0.1926 ± 0.0186 to 0.4269 ± 0.0272 (122%). This gain was accompanied by a modest decrease in the temporal benchmark (13%). SNA increases in drug-screening performance were consistent for classification and regression tasks and outperformed -randomized controls. Our results highlight where data and feature uncertainty may be problematic and how leveraging uncertainty into training improves predictions of drug-target relationships.

Citing Articles

One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening.

Wellnitz J, Jain S, Hochuli J, Maxfield T, Muratov E, Tropsha A J Cheminform. 2025; 17(1):7.

PMID: 39819357 PMC: 11740363. DOI: 10.1186/s13321-025-00948-y.

Biomedical data analyses facilitated by open cheminformatics workflows.

Nittinger E, Clark A, Gaulton A, Zdrazil B J Cheminform. 2023; 15(1):46.

PMID: 37069670 PMC: 10108476. DOI: 10.1186/s13321-023-00718-8.

BiComp-DTA: Drug-target binding affinity prediction through complementary biological-related and compression-based featurization approach.

Kalemati M, Zamani Emani M, Koohi S PLoS Comput Biol. 2023; 19(3):e1011036.

PMID: 37000857 PMC: 10096306. DOI: 10.1371/journal.pcbi.1011036.

Comparing classification models-a practical tutorial.

Walters W J Comput Aided Mol Des. 2021; 36(5):381-389.

PMID: 34549368 DOI: 10.1007/s10822-021-00417-2.