» Articles » PMID: 38603308

Language Models for the Prediction of SARS-CoV-2 Inhibitors

Overview
Date 2024 Apr 11
PMID 38603308
Authors
Affiliations
Soon will be listed here.
Abstract

The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on ∼9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.

Citing Articles

Automation and machine learning augmented by large language models in a catalysis study.

Su Y, Wang X, Ye Y, Xie Y, Xu Y, Jiang Y Chem Sci. 2024; 15(31):12200-12233.

PMID: 39118602 PMC: 11304797. DOI: 10.1039/d3sc07012c.


Enhancing molecular design efficiency: Uniting language models and generative networks with genetic algorithms.

Bhowmik D, Zhang P, Fox Z, Irle S, Gounley J Patterns (N Y). 2024; 5(4):100947.

PMID: 38645768 PMC: 11026973. DOI: 10.1016/j.patter.2024.100947.


Deep learning workflow for the inverse design of molecules with specific optoelectronic properties.

Yoo P, Bhowmik D, Mehta K, Zhang P, Liu F, Lupo Pasini M Sci Rep. 2023; 13(1):20031.

PMID: 37973879 PMC: 10654498. DOI: 10.1038/s41598-023-45385-9.


Two excited-state datasets for quantum chemical UV-vis spectra of organic molecules.

Lupo Pasini M, Mehta K, Yoo P, Irle S Sci Data. 2023; 10(1):546.

PMID: 37604820 PMC: 10442335. DOI: 10.1038/s41597-023-02408-4.


Adaptive language model training for molecular design.

Blanchard A, Bhowmik D, Fox Z, Gounley J, Glaser J, Akpa B J Cheminform. 2023; 15(1):59.

PMID: 37291633 PMC: 10249556. DOI: 10.1186/s13321-023-00719-7.


References
1.
Ozturk H, Ozgur A, Ozkirimli E . DeepDTA: deep drug-target binding affinity prediction. Bioinformatics. 2018; 34(17):i821-i829. PMC: 6129291. DOI: 10.1093/bioinformatics/bty593. View

2.
Yang J, Roy A, Zhang Y . BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions. Nucleic Acids Res. 2012; 41(Database issue):D1096-103. PMC: 3531193. DOI: 10.1093/nar/gks966. View

3.
Brown N, Fiscato M, Segler M, Vaucher A . GuacaMol: Benchmarking Models for de Novo Molecular Design. J Chem Inf Model. 2019; 59(3):1096-1108. DOI: 10.1021/acs.jcim.8b00839. View

4.
Reymond J . The chemical space project. Acc Chem Res. 2015; 48(3):722-30. DOI: 10.1021/ar500432k. View

5.
Subramanian G, Ramsundar B, Pande V, Aldrin Denny R . Computational Modeling of β-Secretase 1 (BACE-1) Inhibitors Using Ligand Based Approaches. J Chem Inf Model. 2016; 56(10):1936-1949. DOI: 10.1021/acs.jcim.6b00290. View