» Articles » PMID: 21283516

An FPT Approach for Predicting Protein Localization from Yeast Genomic Data

Overview
Journal PLoS One
Date 2011 Feb 2
PMID 21283516
Citations 2
Authors
Affiliations
Soon will be listed here.
Abstract

Accurately predicting the localization of proteins is of paramount importance in the quest to determine their respective functions within the cellular compartment. Because of the continuous and rapid progress in the fields of genomics and proteomics, more data are available now than ever before. Coincidentally, data mining methods been developed and refined in order to handle this experimental windfall, thus allowing the scientific community to quantitatively address long-standing questions such as that of protein localization. Here, we develop a frequent pattern tree (FPT) approach to generate a minimum set of rules (mFPT) for predicting protein localization. We acquire a series of rules according to the features of yeast genomic data. The mFPT prediction accuracy is benchmarked against other commonly used methods such as Bayesian networks and logistic regression under various statistical measures. Our results show that mFPT gave better performance than other approaches in predicting protein localization. Meanwhile, setting 0.65 as the minimum hit-rate, we obtained 138 proteins that mFPT predicted differently than the simple naive bayesian method (SNB). In our analysis of these 138 proteins, we present novel predictions for the location for 17 proteins, which currently do not have any defined localization. These predictions can serve as putative annotations and should provide preliminary clues for experimentalists. We also compared our predictions against the eukaryotic subcellular localization database and related predictions by others on protein localization. Our method is quite generalized and can thus be applied to discover the underlying rules for protein-protein interactions, genomic interactions, and structure-function relationships, as well as those of other fields of research.

Citing Articles

Deciphering the Host-Pathogen Interactome of the Wheat-Common Bunt System: A Step towards Enhanced Resilience in Next Generation Wheat.

Kataria R, Kaundal R Int J Mol Sci. 2022; 23(5).

PMID: 35269732 PMC: 8910311. DOI: 10.3390/ijms23052589.


An ensemble classifier for eukaryotic protein subcellular location prediction using gene ontology categories and amino acid hydrophobicity.

Li L, Zhang Y, Zou L, Li C, Yu B, Zheng X PLoS One. 2012; 7(1):e31057.

PMID: 22303481 PMC: 3268814. DOI: 10.1371/journal.pone.0031057.

References
1.
Nakai K, Kanehisa M . Expert system for predicting protein localization sites in gram-negative bacteria. Proteins. 1991; 11(2):95-110. DOI: 10.1002/prot.340110203. View

2.
Drawid A, Gerstein M . A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome. J Mol Biol. 2000; 301(4):1059-75. DOI: 10.1006/jmbi.2000.3968. View

3.
Nakai K, Kanehisa M . A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics. 1992; 14(4):897-911. PMC: 7134799. DOI: 10.1016/s0888-7543(05)80111-9. View

4.
Kumar A, Agarwal S, Heyman J, Matson S, Heidtman M, Piccirillo S . Subcellular localization of the yeast proteome. Genes Dev. 2002; 16(6):707-19. PMC: 155358. DOI: 10.1101/gad.970902. View

5.
Bairoch A, Apweiler R . The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 1999; 28(1):45-8. PMC: 102476. DOI: 10.1093/nar/28.1.45. View