» Articles » PMID: 36697652

Autonomous Design of New Chemical Reactions Using a Variational Autoencoder

Overview
Journal Commun Chem
Publisher Springer Nature
Specialty Chemistry
Date 2023 Jan 25
PMID 36697652
Authors
Affiliations
Soon will be listed here.
Abstract

Artificial intelligence based chemistry models are a promising method of exploring chemical reaction design spaces. However, training datasets based on experimental synthesis are typically reported only for the optimal synthesis reactions. This leads to an inherited bias in the model predictions. Therefore, robust datasets that span the entirety of the solution space are necessary to remove inherited bias and permit complete training of the space. In this study, an artificial intelligence model based on a Variational AutoEncoder (VAE) has been developed and investigated to synthetically generate continuous datasets. The approach involves sampling the latent space to generate new chemical reactions. This developed technique is demonstrated by generating over 7,000,000 new reactions from a training dataset containing only 7,000 reactions. The generated reactions include molecular species that are larger and more diverse than the training set.

Citing Articles

Fuzz Testing Molecular Representation Using Deep Variational Anomaly Generation.

Nogueira V, Sharma R, Guido R, Keiser M J Chem Inf Model. 2025; 65(4):1911-1927.

PMID: 39908426 PMC: 11863373. DOI: 10.1021/acs.jcim.4c01876.


Into the Unknown: How Computation Can Help Explore Uncharted Material Space.

Mroz A, Posligua V, Tarzia A, Wolpert E, Jelfs K J Am Chem Soc. 2022; 144(41):18730-18743.

PMID: 36206484 PMC: 9585593. DOI: 10.1021/jacs.2c06833.

References
1.
Gajendran S, D M, Sugumaran V . Character level and word level embedding with bidirectional LSTM - Dynamic recurrent neural network for biomedical named entity recognition from literature. J Biomed Inform. 2020; 112:103609. DOI: 10.1016/j.jbi.2020.103609. View

2.
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S . PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 2020; 49(D1):D1388-D1395. PMC: 7778930. DOI: 10.1093/nar/gkaa971. View

3.
Jia X, Lynch A, Huang Y, Danielson M, Langat I, Milder A . Anthropogenic biases in chemical reaction data hinder exploratory inorganic synthesis. Nature. 2019; 573(7773):251-255. DOI: 10.1038/s41586-019-1540-5. View

4.
Karwath A, De Raedt L . SMIREP: predicting chemical activity from SMILES. J Chem Inf Model. 2006; 46(6):2432-44. DOI: 10.1021/ci060159g. View

5.
Kovacs D, McCorkindale W, Lee A . Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias. Nat Commun. 2021; 12(1):1695. PMC: 7966799. DOI: 10.1038/s41467-021-21895-w. View