Large-Scale Modeling of Multispecies Acute Toxicity End Points Using Consensus of Multitask Deep Learning Methods

Overview

Journal J Chem Inf Model

Publisher American Chemical Society

Specialties Chemistry
Medical Informatics

Date 2021 Feb 3

PMID 33533614

Citations 16

Authors

Sankalp Jain

Vishal B Siramshetty

Vinicius M Alves

Eugene N Muratov

Nicole Kleinstreuer

Alexander Tropsha

Marc C Nicklaus

Anton Simeonov

Alexey V Zakharov

Affiliations

Soon will be listed here.

Abstract

Computational methods to predict molecular properties regarding safety and toxicology represent alternative approaches to expedite drug development, screen environmental chemicals, and thus significantly reduce associated time and costs. There is a strong need and interest in the development of computational methods that yield reliable predictions of toxicity, and many approaches, including the recently introduced deep neural networks, have been leveraged towards this goal. Herein, we report on the collection, curation, and integration of data from the public data sets that were the source of the ChemIDplus database for systemic acute toxicity. These efforts generated the largest publicly available such data set comprising > 80,000 compounds measured against a total of 59 acute systemic toxicity end points. This data was used for developing multiple single- and multitask models utilizing random forest, deep neural networks, convolutional, and graph convolutional neural network approaches. For the first time, we also reported the consensus models based on different multitask approaches. To the best of our knowledge, prediction models for 36 of the 59 end points have never been published before. Furthermore, our results demonstrated a significantly better performance of the consensus model obtained from three multitask learning approaches that particularly predicted the 29 smaller tasks (less than 300 compounds) better than other models developed in the study. The curated data set and the developed models have been made publicly available at https://github.com/ncats/ld50-multitask, https://predictor.ncats.io/, and https://cactus.nci.nih.gov/download/acute-toxicity-db (data set only) to support regulatory and research applications.

Citing Articles

One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening.

Wellnitz J, Jain S, Hochuli J, Maxfield T, Muratov E, Tropsha A J Cheminform. 2025; 17(1):7.

PMID: 39819357 PMC: 11740363. DOI: 10.1186/s13321-025-00948-y.

Synergizing Chemical Structures and Bioassay Descriptions for Enhanced Molecular Property Prediction in Drug Discovery.

Schuh M, Boldini D, Sieber S J Chem Inf Model. 2024; 64(12):4640-4650.

PMID: 38836773 PMC: 11200265. DOI: 10.1021/acs.jcim.4c00765.

Advancing Drug Safety in Drug Development: Bridging Computational Predictions for Enhanced Toxicity Prediction.

Amorim A, Piochi L, Gaspar A, Preto A, Rosario-Ferreira N, Moreira I Chem Res Toxicol. 2024; 37(6):827-849.

PMID: 38758610 PMC: 11187637. DOI: 10.1021/acs.chemrestox.3c00352.

Expanding Predictive Capacities in Toxicology: Insights from Hackathon-Enhanced Data and Model Aggregation.

Shkil D, Muhamedzhanova A, Petrov P, Skorb E, Aliev T, Steshin I Molecules. 2024; 29(8).

PMID: 38675645 PMC: 11055041. DOI: 10.3390/molecules29081826.

Computational models for predicting liver toxicity in the deep learning era.

Mostafa F, Chen M Front Toxicol. 2024; 5:1340860.

PMID: 38312894 PMC: 10834666. DOI: 10.3389/ftox.2023.1340860.

References

Wexler P . TOXNET: an evolving web resource for toxicology and environmental health information. Toxicology. 2001; 157(1-2):3-10. DOI: 10.1016/s0300-483x(00)00337-1. View

Auerbach S, Shah R, Mav D, Smith C, Walker N, Vallant M . Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning. Toxicol Appl Pharmacol. 2009; 243(3):300-14. DOI: 10.1016/j.taap.2009.11.021. View

Lo Y, Rensi S, Torng W, Altman R . Machine learning in chemoinformatics and drug discovery. Drug Discov Today. 2018; 23(8):1538-1546. PMC: 6078794. DOI: 10.1016/j.drudis.2018.05.010. View

Wu Z, Ramsundar B, Feinberg E, Gomes J, Geniesse C, Pappu A . MoleculeNet: a benchmark for molecular machine learning. Chem Sci. 2018; 9(2):513-530. PMC: 5868307. DOI: 10.1039/c7sc02664a. View

Sheridan R . Time-split cross-validation as a method for estimating the goodness of prospective prediction. J Chem Inf Model. 2013; 53(4):783-90. DOI: 10.1021/ci400084k. View

Wang Y, Zheng M, Xiao J, Lu Y, Wang F, Lu J . Using support vector regression coupled with the genetic algorithm for predicting acute toxicity to the fathead minnow. SAR QSAR Environ Res. 2010; 21(5-6):559-70. DOI: 10.1080/1062936X.2010.502300. View

Ruiz I, Gomez-Nieto M . Study of the Applicability Domain of the QSAR Classification Models by Means of the Rivality and Modelability Indexes. Molecules. 2018; 23(11). PMC: 6278359. DOI: 10.3390/molecules23112756. View

Zhu H, Tropsha A, Fourches D, Varnek A, Papa E, Gramatica P . Combinatorial QSAR modeling of chemical toxicants tested against Tetrahymena pyriformis. J Chem Inf Model. 2008; 48(4):766-84. DOI: 10.1021/ci700443v. View

Dix D, Houck K, Martin M, Richard A, Setzer R, Kavlock R . The ToxCast program for prioritizing toxicity testing of environmental chemicals. Toxicol Sci. 2006; 95(1):5-12. DOI: 10.1093/toxsci/kfl103. View

10.

Varnek A, Gaudin C, Marcou G, Baskin I, Pandey A, Tetko I . Inductive transfer of knowledge: application of multi-task learning and feature net approaches to model tissue-air partition coefficients. J Chem Inf Model. 2009; 49(1):133-44. DOI: 10.1021/ci8002914. View

11.

Ma J, Sheridan R, Liaw A, Dahl G, Svetnik V . Deep neural nets as a method for quantitative structure-activity relationships. J Chem Inf Model. 2015; 55(2):263-74. DOI: 10.1021/ci500747n. View

12.

Svetnik V, Liaw A, Tong C, Culberson J, Sheridan R, Feuston B . Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci. 2003; 43(6):1947-58. DOI: 10.1021/ci034160g. View

13.

Fourches D, Muratov E, Tropsha A . Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J Chem Inf Model. 2010; 50(7):1189-204. PMC: 2989419. DOI: 10.1021/ci100176x. View

14.

Xu Y, Pei J, Lai L . Deep Learning Based Regression and Multiclass Models for Acute Oral Toxicity Prediction with Automatic Chemical Feature Extraction. J Chem Inf Model. 2017; 57(11):2672-2685. DOI: 10.1021/acs.jcim.7b00244. View

15.

Prado-Prado F, Gonzalez-Diaz H, Santana L, Uriarte E . Unified QSAR approach to antimicrobials. Part 2: predicting activity against more than 90 different species in order to halt antibacterial resistance. Bioorg Med Chem. 2006; 15(2):897-902. DOI: 10.1016/j.bmc.2006.10.039. View

16.

Huang B, Boutros P . The parameter sensitivity of random forests. BMC Bioinformatics. 2016; 17(1):331. PMC: 5009551. DOI: 10.1186/s12859-016-1228-x. View

17.

Golbraikh A, Muratov E, Fourches D, Tropsha A . Data set modelability by QSAR. J Chem Inf Model. 2013; 54(1):1-4. PMC: 3984298. DOI: 10.1021/ci400572x. View

18.

Benigni R . Predictive toxicology today: the transition from biological knowledge to practicable models. Expert Opin Drug Metab Toxicol. 2016; 12(9):989-92. DOI: 10.1080/17425255.2016.1206889. View

19.

Xu Y, Ma J, Liaw A, Sheridan R, Svetnik V . Demystifying Multitask Deep Neural Networks for Quantitative Structure-Activity Relationships. J Chem Inf Model. 2017; 57(10):2490-2504. DOI: 10.1021/acs.jcim.7b00087. View

20.

Cai C, Wang S, Xu Y, Zhang W, Tang K, Ouyang Q . Transfer Learning for Drug Discovery. J Med Chem. 2020; 63(16):8683-8694. DOI: 10.1021/acs.jmedchem.9b02147. View