» Articles » PMID: 36694239

Prediction of Protein Solubility Based on Sequence Physicochemical Patterns and Distributed Representation Information with DeepSoluE

Overview
Journal BMC Biol
Publisher Biomed Central
Specialty Biology
Date 2023 Jan 24
PMID 36694239
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Protein solubility is a precondition for efficient heterologous protein expression at the basis of most industrial applications and for functional interpretation in basic research. However, recurrent formation of inclusion bodies is still an inevitable roadblock in protein science and industry, where only nearly a quarter of proteins can be successfully expressed in soluble form. Despite numerous solubility prediction models having been developed over time, their performance remains unsatisfactory in the context of the current strong increase in available protein sequences. Hence, it is imperative to develop novel and highly accurate predictors that enable the prioritization of highly soluble proteins to reduce the cost of actual experimental work.

Results: In this study, we developed a novel tool, DeepSoluE, which predicts protein solubility using a long-short-term memory (LSTM) network with hybrid features composed of physicochemical patterns and distributed representation of amino acids. Comparison results showed that the proposed model achieved more accurate and balanced performance than existing tools. Furthermore, we explored specific features that have a dominant impact on the model performance as well as their interaction effects.

Conclusions: DeepSoluE is suitable for the prediction of protein solubility in E. coli; it serves as a bioinformatics tool for prescreening of potentially soluble targets to reduce the cost of wet-experimental studies. The publicly available webserver is freely accessible at http://lab.malab.cn/~wangchao/softs/DeepSoluE/ .

Citing Articles

One Health Approach to the Computational Design of a Lipoprotein-Based Multi-Epitope Vaccine Against Human and Livestock Tuberculosis.

Shey R, Nchanji G, Stong T, Yaah N, Shintouo C, Yengo B Int J Mol Sci. 2025; 26(4).

PMID: 40004053 PMC: 11855821. DOI: 10.3390/ijms26041587.


ProG-SOL: Predicting Protein Solubility Using Protein Embeddings and Dual-Graph Convolutional Networks.

Li G, Zhang N, Fan L ACS Omega. 2025; 10(4):3910-3916.

PMID: 39926503 PMC: 11800053. DOI: 10.1021/acsomega.4c09688.


Protein engineering in the deep learning era.

Zhou B, Tan Y, Hu Y, Zheng L, Zhong B, Hong L mLife. 2025; 3(4):477-491.

PMID: 39744096 PMC: 11685842. DOI: 10.1002/mlf2.12157.


In silico design and assessment of a multi-epitope peptide vaccine against multidrug-resistant .

Sah S, Gupta S, Bhardwaj N, Gautam L, Capalash N, Sharma P In Silico Pharmacol. 2024; 13(1):7.

PMID: 39726905 PMC: 11668725. DOI: 10.1007/s40203-024-00292-3.


analysis for the development of multi-epitope vaccines against .

Yun J, Kim A, Kim S, Shin E, Ha S, Kim D Front Immunol. 2024; 15:1474346.

PMID: 39624097 PMC: 11609213. DOI: 10.3389/fimmu.2024.1474346.


References
1.
Lv Z, Wang P, Zou Q, Jiang Q . Identification of sub-Golgi protein localization by use of deep representation learning features. Bioinformatics. 2020; 36(24):5600-5609. PMC: 8023683. DOI: 10.1093/bioinformatics/btaa1074. View

2.
Agostini F, Vendruscolo M, Tartaglia G . Sequence-based prediction of protein solubility. J Mol Biol. 2011; 421(2-3):237-41. DOI: 10.1016/j.jmb.2011.12.005. View

3.
Garcia-Moreno B . Adaptations of proteins to cellular and subcellular pH. J Biol. 2009; 8(11):98. PMC: 2804283. DOI: 10.1186/jbiol199. View

4.
Chen Z, Zhao P, Li F, Leier A, Marquez-Lago T, Wang Y . iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics. 2018; 34(14):2499-2502. PMC: 6658705. DOI: 10.1093/bioinformatics/bty140. View

5.
Chen L, Oughtred R, Berman H, Westbrook J . TargetDB: a target registration database for structural genomics projects. Bioinformatics. 2004; 20(16):2860-2. DOI: 10.1093/bioinformatics/bth300. View