» Articles » PMID: 33958589

Inferring Experimental Procedures from Text-based Representations of Chemical Reactions

Overview
Journal Nat Commun
Specialty Biology
Date 2021 May 7
PMID 33958589
Citations 18
Authors
Affiliations
Soon will be listed here.
Abstract

The experimental execution of chemical reactions is a context-dependent and time-consuming process, often solved using the experience collected over multiple decades of laboratory work or searching similar, already executed, experimental protocols. Although data-driven schemes, such as retrosynthetic models, are becoming established technologies in synthetic organic chemistry, the conversion of proposed synthetic routes to experimental procedures remains a burden on the shoulder of domain experts. In this work, we present data-driven models for predicting the entire sequence of synthesis steps starting from a textual representation of a chemical equation, for application in batch organic chemistry. We generated a data set of 693,517 chemical equations and associated action sequences by extracting and processing experimental procedure text from patents, using state-of-the-art natural language models. We used the attained data set to train three different models: a nearest-neighbor model based on recently-introduced reaction fingerprints, and two deep-learning sequence-to-sequence models based on the Transformer and BART architectures. An analysis by a trained chemist revealed that the predicted action sequences are adequate for execution without human intervention in more than 50% of the cases.

Citing Articles

Artificial intelligence in drug development.

Zhang K, Yang X, Wang Y, Yu Y, Huang N, Li G Nat Med. 2025; 31(1):45-59.

PMID: 39833407 DOI: 10.1038/s41591-024-03434-4.


Leveraging infrared spectroscopy for automated structure elucidation.

Alberts M, Laino T, Vaucher A Commun Chem. 2024; 7(1):268.

PMID: 39550488 PMC: 11569215. DOI: 10.1038/s42004-024-01341-w.


Machine learning-guided strategies for reaction conditions design and optimization.

Chen L, Li Y Beilstein J Org Chem. 2024; 20:2476-2492.

PMID: 39376489 PMC: 11457048. DOI: 10.3762/bjoc.20.212.


Deep learning in template-free de novo biosynthetic pathway design of natural products.

Xie X, Gui L, Qiao B, Wang G, Huang S, Zhao Y Brief Bioinform. 2024; 25(6).

PMID: 39373052 PMC: 11456888. DOI: 10.1093/bib/bbae495.


GEMTELLIGENCE: Accelerating gemstone classification with deep learning.

Bendinelli T, Biggio L, Nyfeler D, Ghosh A, Tollan P, Kirschmann M Commun Eng. 2024; 3(1):110.

PMID: 39164470 PMC: 11336078. DOI: 10.1038/s44172-024-00252-x.


References
1.
Segler M, Preuss M, Waller M . Planning chemical syntheses with deep neural networks and symbolic AI. Nature. 2018; 555(7698):604-610. DOI: 10.1038/nature25978. View

2.
Schwaller P, Petraglia R, Zullo V, Nair V, Haeuselmann R, Pisoni R . Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chem Sci. 2021; 11(12):3316-3325. PMC: 8152799. DOI: 10.1039/c9sc05704h. View

3.
Godfrey A, Masquelin T, Hemmerle H . A remote-controlled adaptive medchem lab: an innovative approach to enable drug discovery in the 21st Century. Drug Discov Today. 2013; 18(17-18):795-802. DOI: 10.1016/j.drudis.2013.03.001. View

4.
Coley C, Thomas 3rd D, Lummiss J, Jaworski J, Breen C, Schultz V . A robotic platform for flow synthesis of organic compounds informed by AI planning. Science. 2019; 365(6453). DOI: 10.1126/science.aax1566. View

5.
Steiner S, Wolf J, Glatzel S, Andreou A, Granda J, Keenan G . Organic synthesis in a modular robotic system driven by a chemical programming language. Science. 2018; 363(6423). DOI: 10.1126/science.aav2211. View