» Articles » PMID: 22185599

Which Clustering Algorithm is Better for Predicting Protein Complexes?

Overview
Journal BMC Res Notes
Publisher Biomed Central
Date 2011 Dec 22
PMID 22185599
Citations 11
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Protein-Protein interactions (PPI) play a key role in determining the outcome of most cellular processes. The correct identification and characterization of protein interactions and the networks, which they comprise, is critical for understanding the molecular mechanisms within the cell. Large-scale techniques such as pull down assays and tandem affinity purification are used in order to detect protein interactions in an organism. Today, relatively new high-throughput methods like yeast two hybrid, mass spectrometry, microarrays, and phage display are also used to reveal protein interaction networks.

Results: In this paper we evaluated four different clustering algorithms using six different interaction datasets. We parameterized the MCL, Spectral, RNSC and Affinity Propagation algorithms and applied them to six PPI datasets produced experimentally by Yeast 2 Hybrid (Y2H) and Tandem Affinity Purification (TAP) methods. The predicted clusters, so called protein complexes, were then compared and benchmarked with already known complexes stored in published databases.

Conclusions: While results may differ upon parameterization, the MCL and RNSC algorithms seem to be more promising and more accurate at predicting PPI complexes. Moreover, they predict more complexes than other reviewed algorithms in absolute numbers. On the other hand the spectral clustering algorithm achieves the highest valid prediction rate in our experiments. However, it is nearly always outperformed by both RNSC and MCL in terms of the geometrical accuracy while it generates the fewest valid clusters than any other reviewed algorithm. This article demonstrates various metrics to evaluate the accuracy of such predictions as they are presented in the text below. Supplementary material can be found at: http://www.bioacademy.gr/bioinformatics/projects/ppireview.htm.

Citing Articles

Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters.

Baltoumas F, Karatzas E, Paez-Espino D, Venetsianou N, Aplakidou E, Oulas A Front Bioinform. 2023; 3:1157956.

PMID: 36959975 PMC: 10029925. DOI: 10.3389/fbinf.2023.1157956.


A Guide to Conquer the Biological Network Era Using Graph Theory.

Koutrouli M, Karatzas E, Paez-Espino D, Pavlopoulos G Front Bioeng Biotechnol. 2020; 8:34.

PMID: 32083072 PMC: 7004966. DOI: 10.3389/fbioe.2020.00034.


Detecting protein complexes based on a combination of topological and biological properties in protein-protein interaction network.

Sharma P, Bhattacharyya D, Kalita J J Genet Eng Biotechnol. 2019; 16(1):217-226.

PMID: 30647725 PMC: 6296571. DOI: 10.1016/j.jgeb.2017.11.005.


HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks.

Azad A, Pavlopoulos G, Ouzounis C, Kyrpides N, Buluc A Nucleic Acids Res. 2018; 46(6):e33.

PMID: 29315405 PMC: 5888241. DOI: 10.1093/nar/gkx1313.


Visualizing genome and systems biology: technologies, tools, implementation techniques and trends, past, present and future.

Pavlopoulos G, Malliarakis D, Papanikolaou N, Theodosiou T, Enright A, Iliopoulos I Gigascience. 2015; 4:38.

PMID: 26309733 PMC: 4548842. DOI: 10.1186/s13742-015-0077-2.


References
1.
Mewes H, Frishman D, Mayer K, Munsterkotter M, Noubibou O, Pagel P . MIPS: analysis and annotation of proteins from whole genomes in 2005. Nucleic Acids Res. 2005; 34(Database issue):D169-72. PMC: 1347510. DOI: 10.1093/nar/gkj148. View

2.
Li X, Wu M, Kwoh C, Ng S . Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC Genomics. 2010; 11 Suppl 1:S3. PMC: 2822531. DOI: 10.1186/1471-2164-11-S1-S3. View

3.
Pavlopoulos G, Moschopoulos C, Hooper S, Schneider R, Kossida S . jClust: a clustering and visualization toolbox. Bioinformatics. 2009; 25(15):1994-6. PMC: 2712340. DOI: 10.1093/bioinformatics/btp330. View

4.
Vikis H, Guan K . Glutathione-S-transferase-fusion based assays for studying protein-protein interactions. Methods Mol Biol. 2004; 261:175-86. DOI: 10.1385/1-59259-762-9:175. View

5.
Ulitsky I, Shamir R . Identification of functional modules using network topology and high-throughput data. BMC Syst Biol. 2007; 1:8. PMC: 1839897. DOI: 10.1186/1752-0509-1-8. View