» Articles » PMID: 29554211

DeepSol: a Deep Learning Framework for Sequence-based Protein Solubility Prediction

Overview
Journal Bioinformatics
Specialty Biology
Date 2018 Mar 20
PMID 29554211
Citations 75
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Protein solubility plays a vital role in pharmaceutical research and production yield. For a given protein, the extent of its solubility can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein solubility predictors. In this work we propose, DeepSol, a novel Deep Learning-based protein solubility predictor. The backbone of our framework is a convolutional neural network that exploits k-mer structure and additional sequence and structural features extracted from the protein sequence.

Results: DeepSol outperformed all known sequence-based state-of-the-art solubility prediction methods and attained an accuracy of 0.77 and Matthew's correlation coefficient of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced production capacity and can more reliably predict solubility of novel proteins.

Availability And Implementation: DeepSol's best performing models and results are publicly deposited at https://doi.org/10.5281/zenodo.1162886 (Khurana and Mall, 2018).

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

ProG-SOL: Predicting Protein Solubility Using Protein Embeddings and Dual-Graph Convolutional Networks.

Li G, Zhang N, Fan L ACS Omega. 2025; 10(4):3910-3916.

PMID: 39926503 PMC: 11800053. DOI: 10.1021/acsomega.4c09688.


Advances in cyclotide research: bioactivity to cyclotide-based therapeutics.

Grover A, Singh S, Sindhu S, Lath A, Kumar S Mol Divers. 2025; .

PMID: 39862350 DOI: 10.1007/s11030-025-11113-w.


Benchmarking protein language models for protein crystallization.

Mall R, Kaushik R, Martinez Z, Thomson M, Castiglione F Sci Rep. 2025; 15(1):2381.

PMID: 39827171 PMC: 11743144. DOI: 10.1038/s41598-025-86519-5.


Protein engineering in the deep learning era.

Zhou B, Tan Y, Hu Y, Zheng L, Zhong B, Hong L mLife. 2025; 3(4):477-491.

PMID: 39744096 PMC: 11685842. DOI: 10.1002/mlf2.12157.


Synergistic growth suppression of Fusarium oxysporum MLY127 through Dimethachlon Nanoencapsulation and co-application with Bacillus velezensis MLY71.

Yang L, Gao J, Xiang D, Hu X, Lin G, Liu Y Sci Rep. 2024; 14(1):29967.

PMID: 39623089 PMC: 11612293. DOI: 10.1038/s41598-024-81356-4.


References
1.
Smialowski P, Martin-Galiano A, Mikolajka A, Girschick T, Holak T, Frishman D . Protein solubility: sequence based prediction and experimental verification. Bioinformatics. 2006; 23(19):2536-42. DOI: 10.1093/bioinformatics/btl623. View

2.
Asgari E, Mofrad M . Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics. PLoS One. 2015; 10(11):e0141287. PMC: 4640716. DOI: 10.1371/journal.pone.0141287. View

3.
Wang S, Sun S, Li Z, Zhang R, Xu J . Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model. PLoS Comput Biol. 2017; 13(1):e1005324. PMC: 5249242. DOI: 10.1371/journal.pcbi.1005324. View

4.
van den Berg B, Reinders M, Hulsman M, Wu L, Pel H, Roubos J . Exploring sequence characteristics related to high-level production of secreted proteins in Aspergillus niger. PLoS One. 2012; 7(10):e45869. PMC: 3462195. DOI: 10.1371/journal.pone.0045869. View

5.
Magnan C, Randall A, Baldi P . SOLpro: accurate sequence-based prediction of protein solubility. Bioinformatics. 2009; 25(17):2200-7. DOI: 10.1093/bioinformatics/btp386. View