» Articles » PMID: 39090573

Hybrid Fragment-SMILES Tokenization for ADMET Prediction in Drug Discovery

Overview
Publisher Biomed Central
Specialty Biology
Date 2024 Aug 1
PMID 39090573
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Drug discovery and development is the extremely costly and time-consuming process of identifying new molecules that can interact with a biomarker target to interrupt the disease pathway of interest. In addition to binding the target, a drug candidate needs to satisfy multiple properties affecting absorption, distribution, metabolism, excretion, and toxicity (ADMET). Artificial intelligence approaches provide an opportunity to improve each step of the drug discovery and development process, in which the first question faced by us is how a molecule can be informatively represented such that the in-silico solutions are optimized.

Results: This study introduces a novel hybrid SMILES-fragment tokenization method, coupled with two pre-training strategies, utilizing a Transformer-based model. We investigate the efficacy of hybrid tokenization in improving the performance of ADMET prediction tasks. Our approach leverages MTL-BERT, an encoder-only Transformer model that achieves state-of-the-art ADMET predictions, and contrasts the standard SMILES tokenization with our hybrid method across a spectrum of fragment library cutoffs.

Conclusion: The findings reveal that while an excess of fragments can impede performance, using hybrid tokenization with high frequency fragments enhances results beyond the base SMILES tokenization. This advancement underscores the potential of integrating fragment- and character-level molecular features within the training of Transformer models for ADMET property prediction.

References
1.
Tran T, Wibowo A, Tayara H, Chong K . Artificial Intelligence in Drug Toxicity Prediction: Recent Advances, Challenges, and Future Perspectives. J Chem Inf Model. 2023; 63(9):2628-2643. DOI: 10.1021/acs.jcim.3c00200. View

2.
Montanari F, Kuhnke L, Ter Laak A, Clevert D . Modeling Physico-Chemical ADMET Endpoints with Multitask Graph Convolutional Networks. Molecules. 2019; 25(1). PMC: 6982787. DOI: 10.3390/molecules25010044. View

3.
Song Y, Chen J, Wang W, Chen G, Ma Z . Double-head transformer neural network for molecular property prediction. J Cheminform. 2023; 15(1):27. PMC: 9951429. DOI: 10.1186/s13321-023-00700-4. View

4.
Zhang S, Yan Z, Huang Y, Liu L, He D, Wang W . HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer. Bioinformatics. 2022; 38(13):3444-3453. DOI: 10.1093/bioinformatics/btac342. View

5.
Heid E, Greenman K, Chung Y, Li S, Graff D, Vermeire F . Chemprop: A Machine Learning Package for Chemical Property Prediction. J Chem Inf Model. 2023; 64(1):9-17. PMC: 10777403. DOI: 10.1021/acs.jcim.3c01250. View