» Articles » PMID: 12487759

Mining Protein Sequences for Motifs

Overview
Journal J Comput Biol
Date 2002 Dec 19
PMID 12487759
Citations 33
Authors
Affiliations
Soon will be listed here.
Abstract

We use methods from Data Mining and Knowledge Discovery to design an algorithm for detecting motifs in protein sequences. The algorithm assumes that a motif is constituted by the presence of a "good" combination of residues in appropriate locations of the motif. The algorithm attempts to compile such good combinations into a "pattern dictionary" by processing an aligned training set of protein sequences. The dictionary is subsequently used to detect motifs in new protein sequences. Statistical significance of the detection results are ensured by statistically determining the various parameters of the algorithm. Based on this approach, we have implemented a program called GYM. The Helix-Turn-Helix motif was used as a model system on which to test our program. The program was also extended to detect Homeodomain motifs. The detection results for the two motifs compare favorably with existing programs. In addition, the GYM program provides a lot of useful information about a given protein sequence.

Citing Articles

Recent Advances in Computational Prediction of Secondary and Supersecondary Structures from Protein Sequences.

Zhang J, Qian J, Zou Q, Zhou F, Kurgan L Methods Mol Biol. 2024; 2870:1-19.

PMID: 39543027 DOI: 10.1007/978-1-0716-4213-9_1.


TnSmu1 is a functional integrative and conjugative element in Streptococcus mutans that when expressed causes growth arrest of host bacteria.

McLellan L, Anderson M, Grossman A Mol Microbiol. 2022; 118(6):652-669.

PMID: 36268794 PMC: 10098952. DOI: 10.1111/mmi.14992.


The archetypal gene transfer agent RcGTA is regulated via direct interaction with the enigmatic RNA polymerase omega subunit.

Sherlock D, Fogg P Cell Rep. 2022; 40(6):111183.

PMID: 35947951 PMC: 9638019. DOI: 10.1016/j.celrep.2022.111183.


Staphylococcal Operon Codes for a DNA-Binding Protein SaoC Implicated in the Response to Nutrient Deficit.

Bukowski M, Kosecka-Strojek M, Madry A, Zagorski-Przybylo R, Zadlo T, Gawron K Int J Mol Sci. 2022; 23(12).

PMID: 35742885 PMC: 9223772. DOI: 10.3390/ijms23126443.


Expanding uncapped translation and emerging function of circular RNA in carcinomas and noncarcinomas.

Wang Y, Wu C, Du Y, Li Z, Li M, Hou P Mol Cancer. 2022; 21(1):13.

PMID: 34996480 PMC: 8740365. DOI: 10.1186/s12943-021-01484-7.