» Articles » PMID: 31572784

Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction

Overview
Journal ACS Cent Sci
Specialty Chemistry
Date 2019 Oct 2
PMID 31572784
Citations 200
Authors
Affiliations
Soon will be listed here.
Abstract

Organic synthesis is one of the key stumbling blocks in medicinal chemistry. A necessary yet unsolved step in planning synthesis is solving the forward problem: Given reactants and reagents, predict the products. Similar to other work, we treat reaction prediction as a machine translation problem between simplified molecular-input line-entry system (SMILES) strings (a text-based representation) of reactants, reagents, and the products. We show that a multihead attention Molecular Transformer model outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark data set. Molecular Transformer makes predictions by inferring the correlations between the presence and absence of chemical motifs in the reactant, reagent, and product present in the data set. Our model requires no handcrafted rules and accurately predicts subtle chemical transformations. Crucially, our model can accurately estimate its own uncertainty, with an uncertainty score that is 89% accurate in terms of classifying whether a prediction is correct. Furthermore, we show that the model is able to handle inputs without a reactant-reagent split and including stereochemistry, which makes our method universally applicable.

Citing Articles

Computational tools for the prediction of site- and regioselectivity of organic reactions.

Sigmund L, Assante M, Johansson M, Norrby P, Jorner K, Kabeshov M Chem Sci. 2025; .

PMID: 40070469 PMC: 11891785. DOI: 10.1039/d5sc00541h.


Accelerating the inference of string generation-based chemical reaction models for industrial applications.

Andronov M, Andronova N, Wand M, Schmidhuber J, Clevert D J Cheminform. 2025; 17(1):31.

PMID: 40065398 PMC: 11895308. DOI: 10.1186/s13321-025-00974-w.


Designing Target-specific Data Sets for Regioselectivity Predictions on Complex Substrates.

Schleinitz J, Carretero-Cerdan A, Gurajapu A, Harnik Y, Lee G, Pandey A J Am Chem Soc. 2025; 147(9):7476-7484.

PMID: 39982221 PMC: 11887056. DOI: 10.1021/jacs.4c15902.


Predictive modeling of biodegradation pathways using transformer architectures.

Brydon L, Zhang K, Dobbie G, Taskova K, Wicker J J Cheminform. 2025; 17(1):21.

PMID: 39962584 PMC: 11834682. DOI: 10.1186/s13321-025-00969-7.


Application of Transformers to Chemical Synthesis.

Jin D, Liang Y, Xiong Z, Yang X, Wang H, Zeng J Molecules. 2025; 30(3).

PMID: 39942600 PMC: 11821105. DOI: 10.3390/molecules30030493.


References
1.
Blaschke T, Olivecrona M, Engkvist O, Bajorath J, Chen H . Application of Generative Autoencoder in De Novo Molecular Design. Mol Inform. 2017; 37(1-2). PMC: 5836887. DOI: 10.1002/minf.201700123. View

2.
Coley C, Green W, Jensen K . Machine Learning in Computer-Aided Synthesis Planning. Acc Chem Res. 2018; 51(5):1281-1289. DOI: 10.1021/acs.accounts.8b00087. View

3.
Coley C, Jin W, Rogers L, Jamison T, Jaakkola T, Green W . A graph-convolutional neural network model for the prediction of chemical reactivity. Chem Sci. 2019; 10(2):370-377. PMC: 6335848. DOI: 10.1039/c8sc04228d. View

4.
Segler M, Waller M . Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. Chemistry. 2017; 23(25):5966-5971. DOI: 10.1002/chem.201605499. View

5.
Bohacek R, McMARTIN C, Guida W . The art and practice of structure-based drug design: a molecular modeling perspective. Med Res Rev. 1996; 16(1):3-50. DOI: 10.1002/(SICI)1098-1128(199601)16:1<3::AID-MED1>3.0.CO;2-6. View