» Articles » PMID: 35258973

Machine Learning May Sometimes Simply Capture Literature Popularity Trends: A Case Study of Heterocyclic Suzuki-Miyaura Coupling

Overview
Journal J Am Chem Soc
Specialty Chemistry
Date 2022 Mar 8
PMID 35258973
Authors
Affiliations
Soon will be listed here.
Abstract

Applications of machine learning (ML) to synthetic chemistry rely on the assumption that large numbers of literature-reported examples should enable construction of accurate and predictive models of chemical reactivity. This paper demonstrates that abundance of carefully curated literature data may be insufficient for this purpose. Using an example of Suzuki-Miyaura coupling with heterocyclic building blocks─and a carefully selected database of >10,000 literature examples─we show that ML models cannot offer any meaningful predictions of optimum reaction conditions, even if the search space is restricted to only solvents and bases. This result holds irrespective of the ML model applied (from simple feed-forward to state-of-the-art graph-convolution neural networks) or the representation to describe the reaction partners (various fingerprints, chemical descriptors, latent representations, etc.). In all cases, the ML methods fail to perform significantly better than naive assignments based on the sheer frequency of certain reaction conditions reported in the literature. These unsatisfactory results likely reflect subjective preferences of various chemists to use certain protocols, other biasing factors as mundane as availability of certain solvents/reagents, and/or a lack of negative data. These findings highlight the likely importance of systematically generating reliable and standardized data sets for algorithm training.

Citing Articles

Recommending reaction conditions with label ranking.

Shim E, Tewari A, Cernak T, Zimmerman P Chem Sci. 2025; 16(9):4109-4118.

PMID: 39906388 PMC: 11788591. DOI: 10.1039/d4sc06728b.


Systematic, computational discovery of multicomponent and one-pot reactions.

Roszak R, Gadina L, Wolos A, Makkawi A, Mikulak-Klucznik B, Bilgi Y Nat Commun. 2024; 15(1):10285.

PMID: 39604395 PMC: 11603032. DOI: 10.1038/s41467-024-54611-5.


Estimation of multicomponent reactions' yields from networks of mechanistic steps.

Szymkuc S, Wolos A, Roszak R, Grzybowski B Nat Commun. 2024; 15(1):10286.

PMID: 39604372 PMC: 11603315. DOI: 10.1038/s41467-024-54550-1.


Machine learning-guided strategies for reaction conditions design and optimization.

Chen L, Li Y Beilstein J Org Chem. 2024; 20:2476-2492.

PMID: 39376489 PMC: 11457048. DOI: 10.3762/bjoc.20.212.


Can Deep Learning Search for Exceptional Chiroptical Properties? The Halogenated [6]Helicene Case.

Uceda R, Gijon A, Miguez-Lago S, Cruz C, Blanco V, Fernandez-Alvarez F Angew Chem Int Ed Engl. 2024; 63(49):e202409998.

PMID: 39329214 PMC: 11586703. DOI: 10.1002/anie.202409998.


References
1.
Molga K, Szymkuc S, Grzybowski B . Chemist Ex Machina: Advanced Synthesis Planning by Computers. Acc Chem Res. 2021; 54(5):1094-1106. DOI: 10.1021/acs.accounts.0c00714. View

2.
Beker W, Gajewska E, Badowski T, Grzybowski B . Prediction of Major Regio-, Site-, and Diastereoisomers in Diels-Alder Reactions by Using Machine-Learning: The Importance of Physically Meaningful Descriptors. Angew Chem Int Ed Engl. 2018; 58(14):4515-4519. DOI: 10.1002/anie.201806920. View

3.
Kudo N, Perseghini M, Fu G . A versatile method for Suzuki cross-coupling reactions of nitrogen heterocycles. Angew Chem Int Ed Engl. 2006; 45(8):1282-4. DOI: 10.1002/anie.200503479. View

4.
Roszak R, Beker W, Molga K, Grzybowski B . Rapid and Accurate Prediction of p Values of C-H Acids Using Graph Convolutional Neural Networks. J Am Chem Soc. 2019; 141(43):17142-17149. DOI: 10.1021/jacs.9b05895. View

5.
Li X, Zhang S, Xu L, Hong X . Predicting Regioselectivity in Radical C-H Functionalization of Heterocycles through Machine Learning. Angew Chem Int Ed Engl. 2020; 59(32):13253-13259. DOI: 10.1002/anie.202000959. View