» Articles » PMID: 38956442

Enhancing Efficiency of Protein Language Models with Minimal Wet-lab Data Through Few-shot Learning

Overview
Journal Nat Commun
Specialty Biology
Date 2024 Jul 2
PMID 38956442
Authors
Affiliations
Soon will be listed here.
Abstract

Accurately modeling the protein fitness landscapes holds great importance for protein engineering. Pre-trained protein language models have achieved state-of-the-art performance in predicting protein fitness without wet-lab experimental data, but their accuracy and interpretability remain limited. On the other hand, traditional supervised deep learning models require abundant labeled training examples for performance improvements, posing a practical barrier. In this work, we introduce FSFP, a training strategy that can effectively optimize protein language models under extreme data scarcity for fitness prediction. By combining meta-transfer learning, learning to rank, and parameter-efficient fine-tuning, FSFP can significantly boost the performance of various protein language models using merely tens of labeled single-site mutants from the target protein. In silico benchmarks across 87 deep mutational scanning datasets demonstrate FSFP's superiority over both unsupervised and supervised baselines. Furthermore, we successfully apply FSFP to engineer the Phi29 DNA polymerase through wet-lab experiments, achieving a 25% increase in the positive rate. These results underscore the potential of our approach in aiding AI-guided protein engineering.

Citing Articles

AI-enabled alkaline-resistant evolution of protein to apply in mass production.

Kang L, Wu B, Zhou B, Tan P, Kang Y, Yan Y Elife. 2025; 13.

PMID: 39968946 PMC: 11839161. DOI: 10.7554/eLife.102788.


Discovery of highly active kynureninases for cancer immunotherapy through protein language model.

Eom H, Park S, Cho K, Lee J, Kim H, Kim S Nucleic Acids Res. 2025; 53(1.

PMID: 39777462 PMC: 11704957. DOI: 10.1093/nar/gkae1245.


Protein engineering in the deep learning era.

Zhou B, Tan Y, Hu Y, Zheng L, Zhong B, Hong L mLife. 2025; 3(4):477-491.

PMID: 39744096 PMC: 11685842. DOI: 10.1002/mlf2.12157.


Revolutionizing Molecular Design for Innovative Therapeutic Applications through Artificial Intelligence.

Son A, Park J, Kim W, Yoon Y, Lee S, Park Y Molecules. 2024; 29(19).

PMID: 39407556 PMC: 11477718. DOI: 10.3390/molecules29194626.


Integrating Computational Design and Experimental Approaches for Next-Generation Biologics.

Son A, Park J, Kim W, Lee W, Yoon Y, Ji J Biomolecules. 2024; 14(9).

PMID: 39334841 PMC: 11430650. DOI: 10.3390/biom14091073.


References
1.
Ding X, Zou Z, Brooks Iii C . Deciphering protein evolution and fitness landscapes with latent space models. Nat Commun. 2019; 10(1):5644. PMC: 6904478. DOI: 10.1038/s41467-019-13633-0. View

2.
van Kempen M, Kim S, Tumescheit C, Mirdita M, Lee J, Gilchrist C . Fast and accurate protein structure search with Foldseek. Nat Biotechnol. 2023; 42(2):243-246. PMC: 10869269. DOI: 10.1038/s41587-023-01773-0. View

3.
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O . Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596(7873):583-589. PMC: 8371605. DOI: 10.1038/s41586-021-03819-2. View

4.
Yang K, Wu Z, Arnold F . Machine-learning-guided directed evolution for protein engineering. Nat Methods. 2019; 16(8):687-694. DOI: 10.1038/s41592-019-0496-6. View

5.
Elnaggar A, Heinzinger M, Dallago C, Rehawi G, Wang Y, Jones L . ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans Pattern Anal Mach Intell. 2021; 44(10):7112-7127. DOI: 10.1109/TPAMI.2021.3095381. View