» Articles » PMID: 39773344

CLAIRE: a Contrastive Learning-based Predictor for EC Number of Chemical Reactions

Overview
Journal J Cheminform
Publisher Biomed Central
Specialty Chemistry
Date 2025 Jan 8
PMID 39773344
Authors
Affiliations
Soon will be listed here.
Abstract

Predicting EC numbers for chemical reactions enables efficient enzymatic annotations for computer-aided synthesis planning. However, conventional machine learning approaches encounter challenges due to data scarcity and class imbalance. Here, we introduce CLAIRE (Contrastive Learning-based AnnotatIon for Reaction's EC), a novel framework leveraging contrastive learning, pre-trained language model-based reaction embeddings, and data augmentation to address these limitations. CLAIRE achieved notable performance improvements, demonstrating weighted average F1 scores of 0.861 and 0.911 on the testing set (n = 18,816) and an independent dataset (n = 1040) derived from yeast's metabolic model, respectively. Remarkably, CLAIRE significantly outperformed the state-of-the-art model by 3.65 folds and 1.18 folds, respectively. Its high accuracy positions CLAIRE as a promising tool for retrosynthesis planning, drug fate prediction, and synthetic biology applications. CLAIRE is freely available on GitHub ( https://github.com/zishuozeng/CLAIRE ).Scientific contributionThis work employed contrastive learning for predicting enzymatic reaction's EC numbers, overcoming the challenges in data scarcity and imbalance. The new model achieves the state-of-the-art performance and may facilitate the computer-aided synthesis planning.

References
1.
Schomburg I, Chang A, Schomburg D . BRENDA, enzyme data and metabolic information. Nucleic Acids Res. 2001; 30(1):47-9. PMC: 99121. DOI: 10.1093/nar/30.1.47. View

2.
Zhou J, Li G, Wang R, Chen R, Luo S . A Novel Contrastive Self-Supervised Learning Framework for Solving Data Imbalance in Solder Joint Defect Detection. Entropy (Basel). 2023; 25(2). PMC: 9954869. DOI: 10.3390/e25020268. View

3.
Gu C, Kim G, Kim W, Kim H, Lee S . Current status and applications of genome-scale metabolic models. Genome Biol. 2019; 20(1):121. PMC: 6567666. DOI: 10.1186/s13059-019-1730-3. View

4.
Schneider N, Stiefl N, Landrum G . What's What: The (Nearly) Definitive Guide to Reaction Role Assignment. J Chem Inf Model. 2016; 56(12):2336-2346. DOI: 10.1021/acs.jcim.6b00564. View

5.
Li Y, Wang S, Umarov R, Xie B, Fan M, Li L . DEEPre: sequence-based enzyme EC number prediction by deep learning. Bioinformatics. 2017; 34(5):760-769. PMC: 6030869. DOI: 10.1093/bioinformatics/btx680. View