» Articles » PMID: 39726698

A Hitchhiker's Guide to Deep Chemical Language Processing for Bioactivity Prediction

Overview
Journal Digit Discov
Date 2024 Dec 27
PMID 39726698
Authors
Affiliations
Soon will be listed here.
Abstract

Deep learning has significantly accelerated drug discovery, with 'chemical language' processing (CLP) emerging as a prominent approach. CLP approaches learn from molecular string representations (, Simplified Molecular Input Line Entry Systems [SMILES] and Self-Referencing Embedded Strings [SELFIES]) with methods akin to natural language processing. Despite their growing importance, training predictive CLP models is far from trivial, as it involves many 'bells and whistles'. Here, we analyze the key elements of CLP and provide guidelines for newcomers and experts. Our study spans three neural network architectures, two string representations, three embedding strategies, across ten bioactivity datasets, for both classification and regression purposes. This 'hitchhiker's guide' not only underscores the importance of certain methodological decisions, but it also equips researchers with practical recommendations on ideal choices, , in terms of neural network architectures, molecular representations, and hyperparameter optimization.

References
1.
Ozturk H, Ozgur A, Ozkirimli E . DeepDTA: deep drug-target binding affinity prediction. Bioinformatics. 2018; 34(17):i821-i829. PMC: 6129291. DOI: 10.1093/bioinformatics/bty593. View

2.
Krenn M, Ai Q, Barthel S, Carson N, Frei A, Frey N . SELFIES and the future of molecular string representations. Patterns (N Y). 2022; 3(10):100588. PMC: 9583042. DOI: 10.1016/j.patter.2022.100588. View

3.
Ozcelik R, Ozturk H, Ozgur A, Ozkirimli E . ChemBoost: A Chemical Language Based Approach for Protein - Ligand Binding Affinity Prediction. Mol Inform. 2020; 40(5):e2000212. DOI: 10.1002/minf.202000212. View

4.
Cai C, Wang S, Xu Y, Zhang W, Tang K, Ouyang Q . Transfer Learning for Drug Discovery. J Med Chem. 2020; 63(16):8683-8694. DOI: 10.1021/acs.jmedchem.9b02147. View

5.
Zhao Q, Duan G, Yang M, Cheng Z, Li Y, Wang J . AttentionDTA: Drug-Target Binding Affinity Prediction by Sequence-Based Deep Learning With Attention Mechanism. IEEE/ACM Trans Comput Biol Bioinform. 2022; 20(2):852-863. DOI: 10.1109/TCBB.2022.3170365. View