» Articles » PMID: 15961444

The Predictive Power of the CluSTr Database

Overview
Journal Bioinformatics
Specialty Biology
Date 2005 Jun 18
PMID 15961444
Citations 20
Authors
Affiliations
Soon will be listed here.
Abstract

Summary: The CluSTr database employs a fully automatic single-linkage hierarchical clustering method based on a similarity matrix. In order to compute the matrix, first all-against-all pair-wise comparisons between protein sequences are computed using the Smith-Waterman algorithm. The statistical significance of the similarity scores is then assessed using a Monte Carlo analysis, yielding Z-values, which are used to populate the matrix. This paper describes automated annotation experiments that quantify the predictive power and hence the biological relevance of the CluSTr data. The experiments utilized the UniProt data-mining framework to derive annotation predictions using combinations of InterPro and CluSTr. We show that this combination of data sources greatly increases the precision of predictions made by the data-mining framework, compared with the use of InterPro data alone. We conclude that the CluSTr approach to clustering proteins makes a valuable contribution to traditional protein classifications.

Availability: http://www.ebi.ac.uk/clustr/.

Citing Articles

Entropy-driven partitioning of the hierarchical protein space.

Rappoport N, Stern A, Linial N, Linial M Bioinformatics. 2014; 30(17):i624-30.

PMID: 25161256 PMC: 4147929. DOI: 10.1093/bioinformatics/btu478.


SIMAP--the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage.

Arnold R, Goldenberg F, Mewes H, Rattei T Nucleic Acids Res. 2013; 42(Database issue):D279-84.

PMID: 24165881 PMC: 3965014. DOI: 10.1093/nar/gkt970.


SUS-BAR: a database of pig proteins with statistically validated structural and functional annotation.

Piovesan D, Profiti G, Martelli P, Fariselli P, Fontanesi L, Casadio R Database (Oxford). 2013; 2013:bat065.

PMID: 24065691 PMC: 3781388. DOI: 10.1093/database/bat065.


How to inherit statistically validated annotation within BAR+ protein clusters.

Piovesan D, Martelli P, Fariselli P, Profiti G, Zauli A, Rossi I BMC Bioinformatics. 2013; 14 Suppl 3:S4.

PMID: 23514411 PMC: 3584929. DOI: 10.1186/1471-2105-14-S3-S4.


PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees.

Mi H, Muruganujan A, Thomas P Nucleic Acids Res. 2012; 41(Database issue):D377-86.

PMID: 23193289 PMC: 3531194. DOI: 10.1093/nar/gks1118.