Parameter-efficient Fine-tuning on Large Protein Language Models Improves Signal Peptide Prediction

Overview

Journal Genome Res

Specialty Genetics

Date 2024 Jul 26

PMID 39060029

Authors

Shuai Zeng

Duolin Wang

Lei Jiang

Dong Xu

Affiliations

Soon will be listed here.

Abstract

Signal peptides (SPs) play a crucial role in protein translocation in cells. The development of large protein language models (PLMs) and prompt-based learning provide a new opportunity for SP prediction, especially for the categories with limited annotated data. We present a parameter-efficient fine-tuning (PEFT) framework for SP prediction, PEFT-SP, to effectively utilize pretrained PLMs. We integrated low-rank adaptation (LoRA) into ESM-2 models to better leverage the protein sequence evolutionary knowledge of PLMs. Experiments show that PEFT-SP using LoRA enhances state-of-the-art results, leading to a maximum Matthews correlation coefficient (MCC) gain of 87.3% for SPs with small training samples and an overall MCC gain of 6.1%. Furthermore, we also employed two other PEFT methods, prompt tuning and adapter tuning, in ESM-2 for SP prediction. More elaborate experiments show that PEFT-SP using adapter tuning can also improve the state-of-the-art results by up to 28.1% MCC gain for SPs with small training samples and an overall MCC gain of 3.8%. LoRA requires fewer computing resources and less memory than the adapter tuning during the training stage, making it possible to adapt larger and more powerful protein models for SP prediction.

Citing Articles

Leveraging large language models for peptide antibiotic design.

Guan C, Fernandes F, Franco O, de la Fuente-Nunez C Cell Rep Phys Sci. 2025; 6(1).

PMID: 39949833 PMC: 11823563. DOI: 10.1016/j.xcrp.2024.102359.

PEZy-miner: An artificial intelligence driven approach for the discovery of plastic-degrading enzyme candidates.

Jiang R, Yue Z, Shang L, Wang D, Wei N Metab Eng Commun. 2024; 19:e00248.

PMID: 39310048 PMC: 11414552. DOI: 10.1016/j.mec.2024.e00248.

References

Suzek B, Huang H, McGarvey P, Mazumder R, Wu C . UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007; 23(10):1282-8. DOI: 10.1093/bioinformatics/btm098. View

Wagih O . ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics. 2017; 33(22):3645-3647. DOI: 10.1093/bioinformatics/btx469. View

Nielsen H, Engelbrecht J, Brunak S, von Heijne G . Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 1997; 10(1):1-6. DOI: 10.1093/protein/10.1.1. View

Kall L, Krogh A, Sonnhammer E . A combined transmembrane topology and signal peptide prediction method. J Mol Biol. 2004; 338(5):1027-36. DOI: 10.1016/j.jmb.2004.03.016. View

Chou K, Shen H . Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. Biochem Biophys Res Commun. 2007; 357(3):633-40. DOI: 10.1016/j.bbrc.2007.03.162. View

Palmer T, Berks B . The twin-arginine translocation (Tat) protein export pathway. Nat Rev Microbiol. 2012; 10(7):483-96. DOI: 10.1038/nrmicro2814. View

Dalbey R, Wang P, van Dijl J . Membrane proteases in the bacterial protein secretion and quality control pathway. Microbiol Mol Biol Rev. 2012; 76(2):311-30. PMC: 3372248. DOI: 10.1128/MMBR.05019-11. View

Kall L, Krogh A, Sonnhammer E . Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server. Nucleic Acids Res. 2007; 35(Web Server issue):W429-32. PMC: 1933244. DOI: 10.1093/nar/gkm256. View

Owji H, Nezafat N, Negahdaripour M, HajiEbrahimi A, Ghasemi Y . A comprehensive review of signal peptides: Structure, roles, and applications. Eur J Cell Biol. 2018; 97(6):422-441. DOI: 10.1016/j.ejcb.2018.06.003. View

10.

Zhang Y, Shen H . Signal-3L 2.0: A Hierarchical Mixture Model for Enhancing Protein Signal Peptide Prediction by Incorporating Residue-Domain Cross-Level Features. J Chem Inf Model. 2017; 57(4):988-999. DOI: 10.1021/acs.jcim.6b00484. View

11.

Hulo N, Sigrist C, Le Saux V, Langendijk-Genevaux P, Bordoli L, Gattiker A . Recent improvements to the PROSITE database. Nucleic Acids Res. 2003; 32(Database issue):D134-7. PMC: 308778. DOI: 10.1093/nar/gkh044. View

12.

Nielsen H, KROGH A . Prediction of signal peptides and signal anchors by a hidden Markov model. Proc Int Conf Intell Syst Mol Biol. 1998; 6:122-30. View

13.

Suzek B, Wang Y, Huang H, McGarvey P, Wu C . UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics. 2014; 31(6):926-32. PMC: 4375400. DOI: 10.1093/bioinformatics/btu739. View

14.

Bagos P, Tsirigos K, Plessas S, Liakopoulos T, Hamodrakas S . Prediction of signal peptides in archaea. Protein Eng Des Sel. 2008; 22(1):27-35. DOI: 10.1093/protein/gzn064. View

15.

Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W . Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023; 379(6637):1123-1130. DOI: 10.1126/science.ade2574. View

16.

Savojardo C, Martelli P, Fariselli P, Casadio R . DeepSig: deep learning improves signal peptide detection in proteins. Bioinformatics. 2017; 34(10):1690-1696. PMC: 5946842. DOI: 10.1093/bioinformatics/btx818. View

17.

Bagos P, Tsirigos K, Liakopoulos T, Hamodrakas S . Prediction of lipoprotein signal peptides in Gram-positive bacteria with a Hidden Markov Model. J Proteome Res. 2009; 7(12):5082-93. DOI: 10.1021/pr800162c. View

18.

Bendtsen J, Nielsen H, von Heijne G, Brunak S . Improved prediction of signal peptides: SignalP 3.0. J Mol Biol. 2004; 340(4):783-95. DOI: 10.1016/j.jmb.2004.05.028. View

19.

Petersen T, Brunak S, von Heijne G, Nielsen H . SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 2011; 8(10):785-6. DOI: 10.1038/nmeth.1701. View

20.

Jiang Y, Wang D, Yao Y, Eubel H, Kunzler P, Moller I . MULocDeep: A deep-learning framework for protein subcellular and suborganellar localization prediction with residue-level interpretation. Comput Struct Biotechnol J. 2021; 19:4825-4839. PMC: 8426535. DOI: 10.1016/j.csbj.2021.08.027. View