» Articles » PMID: 37973971

Variational Autoencoder-based Chemical Latent Space for Large Molecular Structures with 3D Complexity

Overview
Journal Commun Chem
Publisher Springer Nature
Specialty Chemistry
Date 2023 Nov 17
PMID 37973971
Authors
Affiliations
Soon will be listed here.
Abstract

The structural diversity of chemical libraries, which are systematic collections of compounds that have potential to bind to biomolecules, can be represented by chemical latent space. A chemical latent space is a projection of a compound structure into a mathematical space based on several molecular features, and it can express structural diversity within a compound library in order to explore a broader chemical space and generate novel compound structures for drug candidates. In this study, we developed a deep-learning method, called NP-VAE (Natural Product-oriented Variational Autoencoder), based on variational autoencoder for managing hard-to-analyze datasets from DrugBank and large molecular structures such as natural compounds with chirality, an essential factor in the 3D complexity of compounds. NP-VAE was successful in constructing the chemical latent space from large-sized compounds that were unable to be handled in existing methods, achieving higher reconstruction accuracy, and demonstrating stable performance as a generative model across various indices. Furthermore, by exploring the acquired latent space, we succeeded in comprehensively analyzing a compound library containing natural compounds and generating novel compound structures with optimized functions.

Citing Articles

Improving Molecular Design with Direct Inverse Analysis of QSAR/QSPR Model.

Shino Y, Kaneko H Mol Inform. 2025; 44(1):e202400227.

PMID: 39797757 PMC: 11724648. DOI: 10.1002/minf.202400227.


Multi-Objective Design of DNA-Stabilized Nanoclusters Using Variational Autoencoders With Automatic Feature Extraction.

Sadeghi E, Mastracco P, Gonzalez-Rosell A, Copp S, Bogdanov P ACS Nano. 2024; 18(39):26997-27008.

PMID: 39288200 PMC: 11447918. DOI: 10.1021/acsnano.4c09640.


Exploring Molecular Heteroencoders with Latent Space Arithmetic: Atomic Descriptors and Molecular Operators.

Gao X, Baimacheva N, Aires-de-Sousa J Molecules. 2024; 29(16).

PMID: 39203047 PMC: 11357237. DOI: 10.3390/molecules29163969.


Chemical language modeling with structured state space sequence models.

Ozcelik R, de Ruiter S, Criscuolo E, Grisoni F Nat Commun. 2024; 15(1):6176.

PMID: 39039051 PMC: 11263548. DOI: 10.1038/s41467-024-50469-9.

References
1.
Griffiths R, Hernandez-Lobato J . Constrained Bayesian optimization for automatic chemical design using variational autoencoders. Chem Sci. 2020; 11(2):577-586. PMC: 7067240. DOI: 10.1039/c9sc04026a. View

2.
Brown N, Fiscato M, Segler M, Vaucher A . GuacaMol: Benchmarking Models for de Novo Molecular Design. J Chem Inf Model. 2019; 59(3):1096-1108. DOI: 10.1021/acs.jcim.8b00839. View

3.
Bohacek R, McMARTIN C, Guida W . The art and practice of structure-based drug design: a molecular modeling perspective. Med Res Rev. 1996; 16(1):3-50. DOI: 10.1002/(SICI)1098-1128(199601)16:1<3::AID-MED1>3.0.CO;2-6. View

4.
Friesner R, Murphy R, Repasky M, Frye L, Greenwood J, Halgren T . Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. J Med Chem. 2006; 49(21):6177-96. DOI: 10.1021/jm051256o. View

5.
Flam-Shepherd D, Zhu K, Aspuru-Guzik A . Language models can learn complex molecular distributions. Nat Commun. 2022; 13(1):3293. PMC: 9174447. DOI: 10.1038/s41467-022-30839-x. View