» Articles » PMID: 38963838

Unsupervised Evolution of Protein and Antibody Complexes with a Structure-informed Language Model

Overview
Journal Science
Specialty Science
Date 2024 Jul 4
PMID 38963838
Authors
Affiliations
Soon will be listed here.
Abstract

Large language models trained on sequence information alone can learn high-level principles of protein design. However, beyond sequence, the three-dimensional structures of proteins determine their specific function, activity, and evolvability. Here, we show that a general protein language model augmented with protein structure backbone coordinates can guide evolution for diverse proteins without the need to model individual functional tasks. We also demonstrate that ESM-IF1, which was only trained on single-chain structures, can be extended to engineer protein complexes. Using this approach, we screened about 30 variants of two therapeutic clinical antibodies used to treat severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. We achieved up to 25-fold improvement in neutralization and 37-fold improvement in affinity against antibody-escaped viral variants of concern BQ.1.1 and XBB.1.5, respectively. These findings highlight the advantage of integrating structural information to identify efficient protein evolution trajectories without requiring any task-specific training data.

Citing Articles

De Novo Design of Large Polypeptides Using a Lightweight Diffusion Model Integrating LSTM and Attention Mechanism Under Per-Residue Secondary Structure Constraints.

Liao S, Xu G, Jin L, Ma J Molecules. 2025; 30(5).

PMID: 40076339 PMC: 11902264. DOI: 10.3390/molecules30051116.


AI in SERS sensing moving from discriminative to generative.

Quarin S, Vang D, Dima R, Stan G, Strobbia P NPJ Biosens. 2025; 2(1):9.

PMID: 39991468 PMC: 11845314. DOI: 10.1038/s44328-025-00033-2.


FASTIA: A rapid and accessible platform for protein variant interaction analysis demonstrated with a single-domain antibody.

Matsunaga R, Tsumoto K Protein Sci. 2025; 34(3):e70065.

PMID: 39981938 PMC: 11843469. DOI: 10.1002/pro.70065.


Physical-aware model accuracy estimation for protein complex using deep learning method.

Wang H, Sun M, Xie L, Liu D, Zhang G Comput Struct Biotechnol J. 2025; 27:478-487.

PMID: 39916698 PMC: 11799971. DOI: 10.1016/j.csbj.2025.01.017.


Discovery of highly active kynureninases for cancer immunotherapy through protein language model.

Eom H, Park S, Cho K, Lee J, Kim H, Kim S Nucleic Acids Res. 2025; 53(1.

PMID: 39777462 PMC: 11704957. DOI: 10.1093/nar/gkae1245.


References
1.
Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W . Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023; 379(6637):1123-1130. DOI: 10.1126/science.ade2574. View

2.
Sumida K, Nunez-Franco R, Kalvet I, Pellock S, Wicky B, Milles L . Improving Protein Expression, Stability, and Function with ProteinMPNN. J Am Chem Soc. 2024; 146(3):2054-2061. PMC: 10811672. DOI: 10.1021/jacs.3c10941. View

3.
Chothia C, Lesk A . The relation between the divergence of sequence and structure in proteins. EMBO J. 1986; 5(4):823-6. PMC: 1166865. DOI: 10.1002/j.1460-2075.1986.tb04288.x. View

4.
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O . Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596(7873):583-589. PMC: 8371605. DOI: 10.1038/s41586-021-03819-2. View

5.
Wittmann B, Yue Y, Arnold F . Informed training set design enables efficient machine learning-assisted directed protein evolution. Cell Syst. 2021; 12(11):1026-1045.e7. DOI: 10.1016/j.cels.2021.07.008. View