» Articles » PMID: 16574641

An Integrated Machine Learning System to Computationally Screen Protein Databases for Protein Binding Peptide Ligands

Overview
Date 2006 Apr 1
PMID 16574641
Citations 7
Authors
Affiliations
Soon will be listed here.
Abstract

A fairly large set of protein interactions is mediated by families of peptide binding domains, such as Src homology 2 (SH2), SH3, PDZ, major histocompatibility complex, etc. To identify their ligands by experimental screening is not only labor-intensive but almost futile in screening low abundance species due to the suppression by high abundance species. An ideal way of studying protein-protein interactions is to use high throughput computational approaches to screen protein sequence databases to direct the validating experiments toward the most promising peptides. Predictors with only good cross-validation were not good enough to screen protein databases. In the current study we built integrated machine learning systems using three novel coding methods and screened the Swiss-Prot and GenBank protein databases for potential ligands of 10 SH3 and three PDZ domains. A large fraction of predictions has already been experimentally confirmed by other independent research groups, indicating a satisfying generalization capability for future applications in identifying protein interactions.

Citing Articles

A mathematical representation of protein binding sites using structural dispersion of atoms from principal axes for classification of binding ligands.

Premarathna G, Ellingson L PLoS One. 2021; 16(4):e0244905.

PMID: 33831020 PMC: 8031081. DOI: 10.1371/journal.pone.0244905.


Identification of methyllysine peptides binding to chromobox protein homolog 6 chromodomain in the human proteome.

Li N, Stein R, He W, Komives E, Wang W Mol Cell Proteomics. 2013; 12(10):2750-60.

PMID: 23842000 PMC: 3790288. DOI: 10.1074/mcp.O112.025015.


Characterization of domain-peptide interaction interface: prediction of SH3 domain-mediated protein-protein interaction network in yeast by generic structure-based models.

Hou T, Li N, Li Y, Wang W J Proteome Res. 2012; 11(5):2982-95.

PMID: 22468754 PMC: 3345086. DOI: 10.1021/pr3000688.


DomPep--a general method for predicting modular domain-mediated protein-protein interactions.

Li L, Zhao B, Du J, Zhang K, Ling C, Li S PLoS One. 2011; 6(10):e25528.

PMID: 22003397 PMC: 3189207. DOI: 10.1371/journal.pone.0025528.


Prediction of protease substrates using sequence and structure features.

Barkan D, Hostetter D, Mahrus S, Pieper U, Wells J, Craik C Bioinformatics. 2010; 26(14):1714-22.

PMID: 20505003 PMC: 2894511. DOI: 10.1093/bioinformatics/btq267.