» Articles » PMID: 18421371

Predicting Co-complexed Protein Pairs from Heterogeneous Data

Overview
Specialty Biology
Date 2008 Apr 19
PMID 18421371
Citations 29
Authors
Affiliations
Soon will be listed here.
Abstract

Proteins do not carry out their functions alone. Instead, they often act by participating in macromolecular complexes and play different functional roles depending on the other members of the complex. It is therefore interesting to identify co-complex relationships. Although protein complexes can be identified in a high-throughput manner by experimental technologies such as affinity purification coupled with mass spectrometry (APMS), these large-scale datasets often suffer from high false positive and false negative rates. Here, we present a computational method that predicts co-complexed protein pair (CCPP) relationships using kernel methods from heterogeneous data sources. We show that a diffusion kernel based on random walks on the full network topology yields good performance in predicting CCPPs from protein interaction networks. In the setting of direct ranking, a diffusion kernel performs much better than the mutual clustering coefficient. In the setting of SVM classifiers, a diffusion kernel performs much better than a linear kernel. We also show that combination of complementary information improves the performance of our CCPP recognizer. A summation of three diffusion kernels based on two-hybrid, APMS, and genetic interaction networks and three sequence kernels achieves better performance than the sequence kernels or diffusion kernels alone. Inclusion of additional features achieves a still better ROC(50) of 0.937. Assuming a negative-to-positive ratio of 600ratio1, the final classifier achieves 89.3% coverage at an estimated false discovery rate of 10%. Finally, we applied our prediction method to two recently described APMS datasets. We find that our predicted positives are highly enriched with CCPPs that are identified by both datasets, suggesting that our method successfully identifies true CCPPs. An SVM classifier trained from heterogeneous data sources provides accurate predictions of CCPPs in yeast. This computational method thereby provides an inexpensive method for identifying protein complexes that extends and complements high-throughput experimental data.

Citing Articles

Defining the extent of gene function using ROC curvature.

Fischer S, Gillis J Bioinformatics. 2022; 38(24):5390-5397.

PMID: 36271855 PMC: 9750128. DOI: 10.1093/bioinformatics/btac692.


Markov chain Monte Carlo simulation of a Bayesian mixture model for gene network inference.

Ko Y, Kim J, Rodriguez-Zas S Genes Genomics. 2019; 41(5):547-555.

PMID: 30741379 DOI: 10.1007/s13258-019-00789-8.


Machine learning applications in genetics and genomics.

Libbrecht M, Noble W Nat Rev Genet. 2015; 16(6):321-32.

PMID: 25948244 PMC: 5204302. DOI: 10.1038/nrg3920.


Probabilistic inference of biological networks via data integration.

Rogers M, Campbell C, Ying Y Biomed Res Int. 2015; 2015:707453.

PMID: 25874225 PMC: 4385617. DOI: 10.1155/2015/707453.


Tissue-aware data integration approach for the inference of pathway interactions in metazoan organisms.

Park C, Krishnan A, Zhu Q, Wong A, Lee Y, Troyanskaya O Bioinformatics. 2014; 31(7):1093-101.

PMID: 25431329 PMC: 4804827. DOI: 10.1093/bioinformatics/btu786.


References
1.
Huh W, Falvo J, Gerke L, Carroll A, Howson R, Weissman J . Global analysis of protein localization in budding yeast. Nature. 2003; 425(6959):686-91. DOI: 10.1038/nature02026. View

2.
Riffle M, Malmstrom L, Davis T . The Yeast Resource Center Public Data Repository. Nucleic Acids Res. 2004; 33(Database issue):D378-82. PMC: 540027. DOI: 10.1093/nar/gki073. View

3.
Guglielmi B, van Berkum N, Klapholz B, Bijma T, Boube M, Boschiero C . A high resolution protein interaction map of the yeast Mediator complex. Nucleic Acids Res. 2004; 32(18):5379-91. PMC: 524289. DOI: 10.1093/nar/gkh878. View

4.
Stevens S, Abelson J . Purification of the yeast U4/U6.U5 small nuclear ribonucleoprotein particle and identification of its proteins. Proc Natl Acad Sci U S A. 1999; 96(13):7226-31. PMC: 22060. DOI: 10.1073/pnas.96.13.7226. View

5.
Pauling M, McPheeters D, Ares Jr M . Functional Cus1p is found with Hsh155p in a multiprotein splicing factor associated with U2 snRNA. Mol Cell Biol. 2000; 20(6):2176-85. PMC: 110834. DOI: 10.1128/MCB.20.6.2176-2185.2000. View