Unsupervised Evolution of Protein and Antibody Complexes with a Structure-informed Language Model
Affiliations
Large language models trained on sequence information alone can learn high-level principles of protein design. However, beyond sequence, the three-dimensional structures of proteins determine their specific function, activity, and evolvability. Here, we show that a general protein language model augmented with protein structure backbone coordinates can guide evolution for diverse proteins without the need to model individual functional tasks. We also demonstrate that ESM-IF1, which was only trained on single-chain structures, can be extended to engineer protein complexes. Using this approach, we screened about 30 variants of two therapeutic clinical antibodies used to treat severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. We achieved up to 25-fold improvement in neutralization and 37-fold improvement in affinity against antibody-escaped viral variants of concern BQ.1.1 and XBB.1.5, respectively. These findings highlight the advantage of integrating structural information to identify efficient protein evolution trajectories without requiring any task-specific training data.
Liao S, Xu G, Jin L, Ma J Molecules. 2025; 30(5).
PMID: 40076339 PMC: 11902264. DOI: 10.3390/molecules30051116.
AI in SERS sensing moving from discriminative to generative.
Quarin S, Vang D, Dima R, Stan G, Strobbia P NPJ Biosens. 2025; 2(1):9.
PMID: 39991468 PMC: 11845314. DOI: 10.1038/s44328-025-00033-2.
Matsunaga R, Tsumoto K Protein Sci. 2025; 34(3):e70065.
PMID: 39981938 PMC: 11843469. DOI: 10.1002/pro.70065.
Physical-aware model accuracy estimation for protein complex using deep learning method.
Wang H, Sun M, Xie L, Liu D, Zhang G Comput Struct Biotechnol J. 2025; 27:478-487.
PMID: 39916698 PMC: 11799971. DOI: 10.1016/j.csbj.2025.01.017.
Discovery of highly active kynureninases for cancer immunotherapy through protein language model.
Eom H, Park S, Cho K, Lee J, Kim H, Kim S Nucleic Acids Res. 2025; 53(1.
PMID: 39777462 PMC: 11704957. DOI: 10.1093/nar/gkae1245.