» Articles » PMID: 38435522

Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering

Overview
Journal ACS Cent Sci
Specialty Chemistry
Date 2024 Mar 4
PMID 38435522
Authors
Affiliations
Soon will be listed here.
Abstract

Enzymes can be engineered at the level of their amino acid sequences to optimize key properties such as expression, stability, substrate range, and catalytic efficiency-or even to unlock new catalytic activities not found in nature. Because the search space of possible proteins is vast, enzyme engineering usually involves discovering an enzyme starting point that has some level of the desired activity followed by directed evolution to improve its "fitness" for a desired application. Recently, machine learning (ML) has emerged as a powerful tool to complement this empirical process. ML models can contribute to (1) starting point discovery by functional annotation of known protein sequences or generating novel protein sequences with desired functions and (2) navigating protein fitness landscapes for fitness optimization by learning mappings between protein sequences and their associated fitness values. In this Outlook, we explain how ML complements enzyme engineering and discuss its future potential to unlock improved engineering outcomes.

Citing Articles

EC2Vec: A Machine Learning Method to Embed Enzyme Commission (EC) Numbers into Vector Representations.

Liu M, Ni X, Ramanujam J, Brylinski M J Chem Inf Model. 2025; 65(5):2173-2179.

PMID: 39981640 PMC: 11898066. DOI: 10.1021/acs.jcim.4c02161.


A Practical Guide to Computational Tools for Engineering Biocatalytic Properties.

Vega A, Planas A, Biarnes X Int J Mol Sci. 2025; 26(3).

PMID: 39940748 PMC: 11817184. DOI: 10.3390/ijms26030980.


Integrating protein language models and automatic biofoundry for enhanced protein evolution.

Zhang Q, Chen W, Qin M, Wang Y, Pu Z, Ding K Nat Commun. 2025; 16(1):1553.

PMID: 39934638 PMC: 11814318. DOI: 10.1038/s41467-025-56751-8.


Large-scale energy decomposition for the analysis of protein stability.

Mansoor S, Frasnetti E, Cucchi I, Magni A, Bonollo G, Serapian S Cell Stress Chaperones. 2025; 30(1):57-68.

PMID: 39884551 PMC: 11847297. DOI: 10.1016/j.cstres.2025.01.001.


Active learning-assisted directed evolution.

Yang J, Lal R, Bowden J, Astudillo R, Hameedi M, Kaur S Nat Commun. 2025; 16(1):714.

PMID: 39821082 PMC: 11739421. DOI: 10.1038/s41467-025-55987-8.


References
1.
Dauparas J, Anishchenko I, Bennett N, Bai H, Ragotte R, Milles L . Robust deep learning-based protein sequence design using ProteinMPNN. Science. 2022; 378(6615):49-56. PMC: 9997061. DOI: 10.1126/science.add2187. View

2.
Huang P, Boyken S, Baker D . The coming of age of de novo protein design. Nature. 2016; 537(7620):320-7. DOI: 10.1038/nature19946. View

3.
Tsuboyama K, Dauparas J, Chen J, Laine E, Mohseni Behbahani Y, Weinstein J . Mega-scale experimental analysis of protein folding stability in biology and design. Nature. 2023; 620(7973):434-444. PMC: 10412457. DOI: 10.1038/s41586-023-06328-6. View

4.
Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H . BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief Bioinform. 2022; 23(6). DOI: 10.1093/bib/bbac409. View

5.
Hie B, Yang K . Adaptive machine learning for protein engineering. Curr Opin Struct Biol. 2021; 72:145-152. DOI: 10.1016/j.sbi.2021.11.002. View