» Articles » PMID: 35939717

Machine Learning Yield Prediction from NiCOlit, a Small-Size Literature Data Set of Nickel Catalyzed C-O Couplings

Overview
Journal J Am Chem Soc
Specialty Chemistry
Date 2022 Aug 8
PMID 35939717
Authors
Affiliations
Soon will be listed here.
Abstract

Synthetic yield prediction using machine learning is intensively studied. Previous work has focused on two categories of data sets: high-throughput experimentation data, as an ideal case study, and data sets extracted from proprietary databases, which are known to have a strong reporting bias toward high yields. However, predicting yields using published reaction data remains elusive. To fill the gap, we built a data set on nickel-catalyzed cross-couplings extracted from organic reaction publications, including scope and optimization information. We demonstrate the importance of including optimization data as a source of failed experiments and emphasize how publication constraints shape the exploration of the chemical space by the synthetic community. While machine learning models still fail to perform out-of-sample predictions, this work shows that adding chemical knowledge enables fair predictions in a low-data regime. Eventually, we hope that this unique public database will foster further improvements of machine learning methods for reaction yield prediction in a more realistic context.

Citing Articles

Designing Target-specific Data Sets for Regioselectivity Predictions on Complex Substrates.

Schleinitz J, Carretero-Cerdan A, Gurajapu A, Harnik Y, Lee G, Pandey A J Am Chem Soc. 2025; 147(9):7476-7484.

PMID: 39982221 PMC: 11887056. DOI: 10.1021/jacs.4c15902.


Predicting and Explaining Yields with Machine Learning for Carboxylated Azoles and Beyond.

Janssen K, Proppe J J Chem Inf Model. 2025; 65(4):1862-1872.

PMID: 39916507 PMC: 11863374. DOI: 10.1021/acs.jcim.4c02336.


Recommending reaction conditions with label ranking.

Shim E, Tewari A, Cernak T, Zimmerman P Chem Sci. 2025; 16(9):4109-4118.

PMID: 39906388 PMC: 11788591. DOI: 10.1039/d4sc06728b.


Using Classifiers To Predict Catalyst Design for Polyketone Microstructure.

Wong Y, Jung H, Lin S, Shammami M, Roshandel H, Dodge H J Am Chem Soc. 2025; 147(5):3913-3918.

PMID: 39849304 PMC: 11803615. DOI: 10.1021/jacs.4c11666.


CLAIRE: a contrastive learning-based predictor for EC number of chemical reactions.

Zeng Z, Guo J, Jin J, Luo X J Cheminform. 2025; 17(1):2.

PMID: 39773344 PMC: 11707929. DOI: 10.1186/s13321-024-00944-8.