Inferring Experimental Procedures from Text-based Representations of Chemical Reactions
Overview
Affiliations
The experimental execution of chemical reactions is a context-dependent and time-consuming process, often solved using the experience collected over multiple decades of laboratory work or searching similar, already executed, experimental protocols. Although data-driven schemes, such as retrosynthetic models, are becoming established technologies in synthetic organic chemistry, the conversion of proposed synthetic routes to experimental procedures remains a burden on the shoulder of domain experts. In this work, we present data-driven models for predicting the entire sequence of synthesis steps starting from a textual representation of a chemical equation, for application in batch organic chemistry. We generated a data set of 693,517 chemical equations and associated action sequences by extracting and processing experimental procedure text from patents, using state-of-the-art natural language models. We used the attained data set to train three different models: a nearest-neighbor model based on recently-introduced reaction fingerprints, and two deep-learning sequence-to-sequence models based on the Transformer and BART architectures. An analysis by a trained chemist revealed that the predicted action sequences are adequate for execution without human intervention in more than 50% of the cases.
Artificial intelligence in drug development.
Zhang K, Yang X, Wang Y, Yu Y, Huang N, Li G Nat Med. 2025; 31(1):45-59.
PMID: 39833407 DOI: 10.1038/s41591-024-03434-4.
Leveraging infrared spectroscopy for automated structure elucidation.
Alberts M, Laino T, Vaucher A Commun Chem. 2024; 7(1):268.
PMID: 39550488 PMC: 11569215. DOI: 10.1038/s42004-024-01341-w.
Machine learning-guided strategies for reaction conditions design and optimization.
Chen L, Li Y Beilstein J Org Chem. 2024; 20:2476-2492.
PMID: 39376489 PMC: 11457048. DOI: 10.3762/bjoc.20.212.
Deep learning in template-free de novo biosynthetic pathway design of natural products.
Xie X, Gui L, Qiao B, Wang G, Huang S, Zhao Y Brief Bioinform. 2024; 25(6).
PMID: 39373052 PMC: 11456888. DOI: 10.1093/bib/bbae495.
GEMTELLIGENCE: Accelerating gemstone classification with deep learning.
Bendinelli T, Biggio L, Nyfeler D, Ghosh A, Tollan P, Kirschmann M Commun Eng. 2024; 3(1):110.
PMID: 39164470 PMC: 11336078. DOI: 10.1038/s44172-024-00252-x.