Dissecting Protein Loops with a Statistical Scalpel Suggests a Functional Implication of Some Structural Motifs
Overview
Affiliations
Background: One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function.
Results: Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM.
Conclusions: Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.
ANN based prediction of ligand binding sites outside deep cavities to facilitate drug designing.
Singh K, Singh Malik Y Curr Res Struct Biol. 2024; 7:100144.
PMID: 38681239 PMC: 11047793. DOI: 10.1016/j.crstbi.2024.100144.
Triki D, Cano Contreras M, Flatters D, Visseaux B, Descamps D, Camproux A Sci Rep. 2018; 8(1):710.
PMID: 29335428 PMC: 5768731. DOI: 10.1038/s41598-017-18941-3.
Regad L, Cheron J, Triki D, Senac C, Flatters D, Camproux A PLoS One. 2017; 12(8):e0182972.
PMID: 28817602 PMC: 5560695. DOI: 10.1371/journal.pone.0182972.
Detecting protein candidate fragments using a structural alphabet profile comparison approach.
Shen Y, Picord G, Guyon F, Tuffery P PLoS One. 2013; 8(11):e80493.
PMID: 24303019 PMC: 3841190. DOI: 10.1371/journal.pone.0080493.
SA-Mot: a web server for the identification of motifs of interest extracted from protein loops.
Regad L, Saladin A, Maupetit J, Geneix C, Camproux A Nucleic Acids Res. 2011; 39(Web Server issue):W203-9.
PMID: 21665924 PMC: 3125790. DOI: 10.1093/nar/gkr410.