Variational Autoencoder-based Chemical Latent Space for Large Molecular Structures with 3D Complexity
Overview
Authors
Affiliations
The structural diversity of chemical libraries, which are systematic collections of compounds that have potential to bind to biomolecules, can be represented by chemical latent space. A chemical latent space is a projection of a compound structure into a mathematical space based on several molecular features, and it can express structural diversity within a compound library in order to explore a broader chemical space and generate novel compound structures for drug candidates. In this study, we developed a deep-learning method, called NP-VAE (Natural Product-oriented Variational Autoencoder), based on variational autoencoder for managing hard-to-analyze datasets from DrugBank and large molecular structures such as natural compounds with chirality, an essential factor in the 3D complexity of compounds. NP-VAE was successful in constructing the chemical latent space from large-sized compounds that were unable to be handled in existing methods, achieving higher reconstruction accuracy, and demonstrating stable performance as a generative model across various indices. Furthermore, by exploring the acquired latent space, we succeeded in comprehensively analyzing a compound library containing natural compounds and generating novel compound structures with optimized functions.
Improving Molecular Design with Direct Inverse Analysis of QSAR/QSPR Model.
Shino Y, Kaneko H Mol Inform. 2025; 44(1):e202400227.
PMID: 39797757 PMC: 11724648. DOI: 10.1002/minf.202400227.
Sadeghi E, Mastracco P, Gonzalez-Rosell A, Copp S, Bogdanov P ACS Nano. 2024; 18(39):26997-27008.
PMID: 39288200 PMC: 11447918. DOI: 10.1021/acsnano.4c09640.
Gao X, Baimacheva N, Aires-de-Sousa J Molecules. 2024; 29(16).
PMID: 39203047 PMC: 11357237. DOI: 10.3390/molecules29163969.
Chemical language modeling with structured state space sequence models.
Ozcelik R, de Ruiter S, Criscuolo E, Grisoni F Nat Commun. 2024; 15(1):6176.
PMID: 39039051 PMC: 11263548. DOI: 10.1038/s41467-024-50469-9.