» Articles » PMID: 37755795

Datasets Construction and Development of QSAR Models for Predicting Micronucleus In Vitro and In Vivo Assay Outcomes

Overview
Journal Toxics
Date 2023 Sep 27
PMID 37755795
Authors
Affiliations
Soon will be listed here.
Abstract

In silico (quantitative) structure-activity relationship modeling is an approach that provides a fast and cost-effective alternative to assess the genotoxic potential of chemicals. However, one of the limiting factors for model development is the availability of consolidated experimental datasets. In the present study, we collected experimental data on micronuclei in vitro and in vivo, utilizing databases and conducting a PubMed search, aided by text mining using the BioBERT large language model. Chemotype enrichment analysis on the updated datasets was performed to identify enriched substructures. Additionally, chemotypes common for both endpoints were found. Five machine learning models in combination with molecular descriptors, twelve fingerprints and two data balancing techniques were applied to construct individual models. The best-performing individual models were selected for the ensemble construction. The curated final dataset consists of 981 chemicals for micronuclei in vitro and 1309 for mouse micronuclei in vivo, respectively. Out of 18 chemotypes enriched in micronuclei in vitro, only 7 were found to be relevant for in vivo prediction. The ensemble model exhibited high accuracy and sensitivity when applied to an external test set of in vitro data. A good balanced predictive performance was also achieved for the micronucleus in vivo endpoint.

References
1.
Tetko I, Gasteiger J, Todeschini R, Mauri A, Livingstone D, Ertl P . Virtual computational chemistry laboratory--design and description. J Comput Aided Mol Des. 2005; 19(6):453-63. DOI: 10.1007/s10822-005-8694-y. View

2.
Ashby J, Tennant R . Chemical structure, Salmonella mutagenicity and extent of carcinogenicity as indicators of genotoxic carcinogenesis among 222 chemicals tested in rodents by the U.S. NCI/NTP. Mutat Res. 1988; 204(1):17-115. DOI: 10.1016/0165-1218(88)90114-0. View

3.
Hsieh J, Smith-Roe S, Huang R, Sedykh A, Shockley K, Auerbach S . Identifying Compounds with Genotoxicity Potential Using Tox21 High-Throughput Screening Assays. Chem Res Toxicol. 2019; 32(7):1384-1401. PMC: 6740247. DOI: 10.1021/acs.chemrestox.9b00053. View

4.
Yang C, Tarkhov A, Marusczyk J, Bienfait B, Gasteiger J, Kleinoeder T . New publicly available chemical query language, CSRML, to support chemotype representations for application to data mining and modeling. J Chem Inf Model. 2015; 55(3):510-28. DOI: 10.1021/ci500667v. View

5.
Van Bossuyt M, Raitano G, Honma M, Van Hoeck E, Vanhaecke T, Rogiers V . New QSAR models to predict chromosome damaging potential based on the in vivo micronucleus test. Toxicol Lett. 2020; 329:80-84. DOI: 10.1016/j.toxlet.2020.04.016. View