» Articles » PMID: 39060029

Parameter-efficient Fine-tuning on Large Protein Language Models Improves Signal Peptide Prediction

Overview
Journal Genome Res
Specialty Genetics
Date 2024 Jul 26
PMID 39060029
Authors
Affiliations
Soon will be listed here.
Abstract

Signal peptides (SPs) play a crucial role in protein translocation in cells. The development of large protein language models (PLMs) and prompt-based learning provide a new opportunity for SP prediction, especially for the categories with limited annotated data. We present a parameter-efficient fine-tuning (PEFT) framework for SP prediction, PEFT-SP, to effectively utilize pretrained PLMs. We integrated low-rank adaptation (LoRA) into ESM-2 models to better leverage the protein sequence evolutionary knowledge of PLMs. Experiments show that PEFT-SP using LoRA enhances state-of-the-art results, leading to a maximum Matthews correlation coefficient (MCC) gain of 87.3% for SPs with small training samples and an overall MCC gain of 6.1%. Furthermore, we also employed two other PEFT methods, prompt tuning and adapter tuning, in ESM-2 for SP prediction. More elaborate experiments show that PEFT-SP using adapter tuning can also improve the state-of-the-art results by up to 28.1% MCC gain for SPs with small training samples and an overall MCC gain of 3.8%. LoRA requires fewer computing resources and less memory than the adapter tuning during the training stage, making it possible to adapt larger and more powerful protein models for SP prediction.

Citing Articles

Leveraging large language models for peptide antibiotic design.

Guan C, Fernandes F, Franco O, de la Fuente-Nunez C Cell Rep Phys Sci. 2025; 6(1).

PMID: 39949833 PMC: 11823563. DOI: 10.1016/j.xcrp.2024.102359.


PEZy-miner: An artificial intelligence driven approach for the discovery of plastic-degrading enzyme candidates.

Jiang R, Yue Z, Shang L, Wang D, Wei N Metab Eng Commun. 2024; 19:e00248.

PMID: 39310048 PMC: 11414552. DOI: 10.1016/j.mec.2024.e00248.

References
1.
Suzek B, Huang H, McGarvey P, Mazumder R, Wu C . UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007; 23(10):1282-8. DOI: 10.1093/bioinformatics/btm098. View

2.
Wagih O . ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics. 2017; 33(22):3645-3647. DOI: 10.1093/bioinformatics/btx469. View

3.
Nielsen H, Engelbrecht J, Brunak S, von Heijne G . Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 1997; 10(1):1-6. DOI: 10.1093/protein/10.1.1. View

4.
Kall L, Krogh A, Sonnhammer E . A combined transmembrane topology and signal peptide prediction method. J Mol Biol. 2004; 338(5):1027-36. DOI: 10.1016/j.jmb.2004.03.016. View

5.
Chou K, Shen H . Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. Biochem Biophys Res Commun. 2007; 357(3):633-40. DOI: 10.1016/j.bbrc.2007.03.162. View