SCLpred: Protein Subcellular Localization Prediction by N-to-1 Neural Networks
Overview
Affiliations
Summary: Knowledge of the subcellular location of a protein provides valuable information about its function and possible interaction with other proteins. In the post-genomic era, fast and accurate predictors of subcellular location are required if this abundance of sequence data is to be fully exploited. We have developed a subcellular localization predictor (SCLpred), which predicts the location of a protein into four classes for animals and fungi and five classes for plants (secreted, cytoplasm, nucleus, mitochondrion and chloroplast) using machine learning models trained on large non-redundant sets of protein sequences. The algorithm powering SCLpred is a novel Neural Network (N-to-1 Neural Network, or N1-NN) we have developed, which is capable of mapping whole sequences into single properties (a functional class, in this work) without resorting to predefined transformations, but rather by adaptively compressing the sequence into a hidden feature vector. We benchmark SCLpred against other publicly available predictors using two benchmarks including a new subset of Swiss-Prot Release 2010_06. We show that SCLpred surpasses the state of the art. The N1-NN algorithm is fully general and may be applied to a host of problems of similar shape, that is, in which a whole sequence needs to be mapped into a fixed-size array of properties, and the adaptive compression it operates may shed light on the space of protein sequences.
Availability: The predictive systems described in this article are publicly available as a web server at http://distill.ucd.ie/distill/.
Contact: gianluca.pollastri@ucd.ie.
Impact of Alignments on the Accuracy of Protein Subcellular Localization Predictions.
Gillani M, Pollastri G Proteins. 2024; 93(3):745-759.
PMID: 39575640 PMC: 11809130. DOI: 10.1002/prot.26767.
SCLpred-ECL: Subcellular Localization Prediction by Deep N-to-1 Convolutional Neural Networks.
Gillani M, Pollastri G Int J Mol Sci. 2024; 25(10).
PMID: 38791479 PMC: 11121631. DOI: 10.3390/ijms25105440.
Protein subcellular localization prediction tools.
Gillani M, Pollastri G Comput Struct Biotechnol J. 2024; 23:1796-1807.
PMID: 38707539 PMC: 11066471. DOI: 10.1016/j.csbj.2024.04.032.
Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models.
Yue T, Wang Y, Zhang L, Gu C, Xue H, Wang W Int J Mol Sci. 2023; 24(21).
PMID: 37958843 PMC: 10649223. DOI: 10.3390/ijms242115858.
Cell-Penetrating Milk-Derived Peptides with a Non-Inflammatory Profile.
Agoni C, Stavropoulos I, Kirwan A, Mysior M, Holton T, Kranjc T Molecules. 2023; 28(19).
PMID: 37836842 PMC: 10574647. DOI: 10.3390/molecules28196999.