» Articles » PMID: 34351757

Data Science Meets Physical Organic Chemistry

Overview
Journal Acc Chem Res
Specialty Chemistry
Date 2021 Aug 5
PMID 34351757
Citations 22
Authors
Affiliations
Soon will be listed here.
Abstract

ConspectusAt the heart of synthetic chemistry is the holy grail of predictable catalyst design. In particular, researchers involved in reaction development in asymmetric catalysis have pursued a variety of strategies toward this goal. This is driven by both the pragmatic need to achieve high selectivities and the inability to readily identify why a certain catalyst is effective for a given reaction. While empiricism and intuition have dominated the field of asymmetric catalysis since its inception, enantioselectivity offers a mechanistically rich platform to interrogate catalyst-structure response patterns that explain the performance of a particular catalyst or substrate.In the early stages of an asymmetric reaction development campaign, the overarching mechanism of the reaction, catalyst speciation, the turnover limiting step, and many other details are unknown or posited based on related reactions. Considering the unclear details leading to a successful reaction, initial enantioselectivity data are often used to intuitively guide the ultimate direction of optimization. However, if the conditions of the Curtin-Hammett principle are satisfied, then measured enantioselectivity can be directly connected to the ensemble of diastereomeric transition states (TSs) that lead to the enantiomeric products, and the associated free energy difference between competing TSs (ΔΔ = - ln[()/()], where () and () represent the concentrations of the enantiomeric products). We, and others, speculated that this important piece of information can be leveraged to guide reaction optimization in a quantitative way.Although traditional linear free energy relationships (LFERs), such as Hammett plots, have been used to illuminate important mechanistic features, we sought to develop data science derived tools to expand the power of LFERs in order to describe complex reactions frequently encountered in modern asymmetric catalysis. Specifically, we investigated whether enantioselectivity data from a reaction can be quantitatively connected to the attributes of reaction components, such as catalyst and substrate structural features, to harness data for asymmetric catalyst design.In this context, we developed a workflow to relate computationally derived features of reaction components to enantioselectivity using data science tools. The mathematical representation of molecules can incorporate many aspects of a transformation, such as molecular features from substrate, product, catalyst, and proposed transition states. Statistical models relating these features to reaction outputs can be used for various tasks, such as performance prediction of untested molecules. Perhaps most importantly, statistical models can guide the generation of mechanistic hypotheses that are embedded within complex patterns of reaction responses. Overall, merging traditional physical organic experiments with statistical modeling techniques creates a feedback loop that enables both evaluation of multiple mechanistic hypotheses and future catalyst design. In this Account, we highlight the evolution and application of this approach in the context of a collaborative program based on chiral phosphoric acid catalysts (CPAs) in asymmetric catalysis.

Citing Articles

Connecting the complexity of stereoselective synthesis to the evolution of predictive tools.

Li J, Reid J Chem Sci. 2025; 16(9):3832-3851.

PMID: 39911341 PMC: 11791519. DOI: 10.1039/d4sc07461k.


Applying statistical modeling strategies to sparse datasets in synthetic chemistry.

Haas B, Kalyani D, Sigman M Sci Adv. 2025; 11(1):eadt3013.

PMID: 39742471 PMC: 11691635. DOI: 10.1126/sciadv.adt3013.


Data-Driven Discovery of a New Fluorescent BASHY Dye for Bioimaging.

Ravasco J, Felicidade J, Pinto M, Santos F, Campos-Gonzalez R, Arteaga J JACS Au. 2024; 4(11):4212-4222.

PMID: 39610736 PMC: 11600176. DOI: 10.1021/jacsau.4c00473.


Experimentally-based Fe-catalyzed ethene oligomerization machine learning model provides highly accurate prediction of propagation/termination selectivity.

Yang B, Schaefer A, Small B, Leseberg J, Bischof S, Webster-Gardiner M Chem Sci. 2024; .

PMID: 39449687 PMC: 11495513. DOI: 10.1039/d4sc03433c.


Supramolecular Catalyzed Cascade Reduction of Azaarenes Interrogated via Data Science.

Treacy S, Smith A, Bergman R, Raymond K, Toste F J Am Chem Soc. 2024; 146(43):29792-29800.

PMID: 39432827 PMC: 11528432. DOI: 10.1021/jacs.4c11482.


References
1.
Neel A, Milo A, Sigman M, Toste F . Enantiodivergent Fluorination of Allylic Alcohols: Data Set Design Reveals Structural Interplay between Achiral Directing Group and Chiral Anion. J Am Chem Soc. 2016; 138(11):3863-75. PMC: 5176255. DOI: 10.1021/jacs.6b00356. View

2.
Gomez-Gallego M, Sierra M . Kinetic isotope effects in the study of organometallic reaction mechanisms. Chem Rev. 2011; 111(8):4857-963. DOI: 10.1021/cr100436k. View

3.
Wheeler S, Bloom J . Toward a more complete understanding of noncovalent interactions involving aromatic rings. J Phys Chem A. 2014; 118(32):6133-47. DOI: 10.1021/jp504415p. View

4.
Buckley N, Oppenheimer N . Reactions of Charged Substrates. 5. The Solvolysis and Sodium Azide Substitution Reactions of Benzylpyridinium Ions in Deuterium Oxide. J Org Chem. 1996; 61(21):7360-7372. DOI: 10.1021/jo960729j. View

5.
Wheeler S . Understanding substituent effects in noncovalent interactions involving aromatic rings. Acc Chem Res. 2012; 46(4):1029-38. DOI: 10.1021/ar300109n. View