» Articles » PMID: 33926567

Assessing the Calibration in Toxicological in Vitro Models with Conformal Prediction

Overview
Journal J Cheminform
Publisher Biomed Central
Specialty Chemistry
Date 2021 Apr 30
PMID 33926567
Citations 6
Authors
Affiliations
Soon will be listed here.
Abstract

Machine learning methods are widely used in drug discovery and toxicity prediction. While showing overall good performance in cross-validation studies, their predictive power (often) drops in cases where the query samples have drifted from the training data's descriptor space. Thus, the assumption for applying machine learning algorithms, that training and test data stem from the same distribution, might not always be fulfilled. In this work, conformal prediction is used to assess the calibration of the models. Deviations from the expected error may indicate that training and test data originate from different distributions. Exemplified on the Tox21 datasets, composed of chronologically released Tox21Train, Tox21Test and Tox21Score subsets, we observed that while internally valid models could be trained using cross-validation on Tox21Train, predictions on the external Tox21Score data resulted in higher error rates than expected. To improve the prediction on the external sets, a strategy exchanging the calibration set with more recent data, such as Tox21Test, has successfully been introduced. We conclude that conformal prediction can be used to diagnose data drifts and other issues related to model calibration. The proposed improvement strategy-exchanging the calibration data only-is convenient as it does not require retraining of the underlying model.

Citing Articles

CPSign: conformal prediction for cheminformatics modeling.

Arvidsson McShane S, Norinder U, Alvarsson J, Ahlberg E, Carlsson L, Spjuth O J Cheminform. 2024; 16(1):75.

PMID: 38943219 PMC: 11214261. DOI: 10.1186/s13321-024-00870-9.


Reliable anti-cancer drug sensitivity prediction and prioritization.

Lenhof K, Eckhart L, Rolli L, Volkamer A, Lenhof H Sci Rep. 2024; 14(1):12303.

PMID: 38811639 PMC: 11137046. DOI: 10.1038/s41598-024-62956-6.


Federated Learning in Computational Toxicology: An Industrial Perspective on the Effiris Hackathon.

Bassani D, Brigo A, Andrews-Morger A Chem Res Toxicol. 2023; 36(9):1503-1517.

PMID: 37584277 PMC: 10523574. DOI: 10.1021/acs.chemrestox.3c00137.


Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data.

Morger A, Garcia de Lomana M, Norinder U, Svensson F, Kirchmair J, Mathea M Sci Rep. 2022; 12(1):7244.

PMID: 35508546 PMC: 9068909. DOI: 10.1038/s41598-022-09309-3.


Synergy conformal prediction applied to large-scale bioactivity datasets and in federated learning.

Norinder U, Spjuth O, Svensson F J Cheminform. 2021; 13(1):77.

PMID: 34600569 PMC: 8487527. DOI: 10.1186/s13321-021-00555-7.


References
1.
Richard A, Judson R, Houck K, Grulke C, Volarath P, Thillainadarajah I . ToxCast Chemical Landscape: Paving the Road to 21st Century Toxicology. Chem Res Toxicol. 2016; 29(8):1225-51. DOI: 10.1021/acs.chemrestox.6b00135. View

2.
Alves V, Muratov E, Zakharov A, Muratov N, Andrade C, Tropsha A . Chemical toxicity prediction for major classes of industrial chemicals: Is it possible to develop universal models covering cosmetics, drugs, and pesticides?. Food Chem Toxicol. 2017; 112:526-534. PMC: 5638676. DOI: 10.1016/j.fct.2017.04.008. View

3.
Mathea M, Klingspohn W, Baumann K . Chemoinformatic Classification Methods and their Applicability Domain. Mol Inform. 2016; 35(5):160-80. DOI: 10.1002/minf.201501019. View

4.
Alvarsson J, Eklund M, Engkvist O, Spjuth O, Carlsson L, Wikberg J . Ligand-based target prediction with signature fingerprints. J Chem Inf Model. 2014; 54(10):2647-53. DOI: 10.1021/ci500361u. View

5.
Cortes-Ciriano I, Skuta C, Bender A, Svozil D . QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction. J Cheminform. 2021; 12(1):41. PMC: 7339533. DOI: 10.1186/s13321-020-00444-5. View