BAR-PLUS: the Bologna Annotation Resource Plus for Functional and Structural Annotation of Protein Sequences

Overview

Journal Nucleic Acids Res

Publisher Oxford University Press

Specialty Biochemistry

Date 2011 May 31

PMID 21622657

Citations 15

Authors

Damiano Piovesan

Pier Luigi Martelli

Piero Fariselli

Andrea Zauli

Ivan Rossi

Rita Casadio

Affiliations

Soon will be listed here.

Abstract

We introduce BAR-PLUS (BAR(+)), a web server for functional and structural annotation of protein sequences. BAR(+) is based on a large-scale genome cross comparison and a non-hierarchical clustering procedure characterized by a metric that ensures a reliable transfer of features within clusters. In this version, the method takes advantage of a large-scale pairwise sequence comparison of 13,495,736 protein chains also including 988 complete proteomes. Available sequence annotation is derived from UniProtKB, GO, Pfam and PDB. When PDB templates are present within a cluster (with or without their SCOP classification), profile Hidden Markov Models (HMMs) are computed on the basis of sequence to structure alignment and are cluster-associated (Cluster-HMM). Therefrom, a library of 10,858 HMMs is made available for aligning even distantly related sequences for structural modelling. The server also provides pairwise query sequence-structural target alignments computed from the correspondent Cluster-HMM. BAR(+) in its present version allows three main categories of annotation: PDB [with or without SCOP (*)] and GO and/or Pfam; PDB (*) without GO and/or Pfam; GO and/or Pfam without PDB (*) and no annotation. Each category can further comprise clusters where GO and Pfam functional annotations are or are not statistically significant. BAR(+) is available at http://bar.biocomp.unibo.it/bar2.0.

Citing Articles

A large-scale assessment of sequence database search tools for homology-based protein function prediction.

Zhang C, Freddolino L Brief Bioinform. 2024; 25(4).

PMID: 39038936 PMC: 11262835. DOI: 10.1093/bib/bbae349.

Pseudo2GO: A Graph-Based Deep Learning Method for Pseudogene Function Prediction by Borrowing Information From Coding Genes.

Fan K, Zhang Y Front Genet. 2020; 11:807.

PMID: 33014009 PMC: 7461887. DOI: 10.3389/fgene.2020.00807.

Graph2GO: a multi-modal attributed network embedding method for inferring protein functions.

Fan K, Guan Y, Zhang Y Gigascience. 2020; 9(8).

PMID: 32770210 PMC: 7414417. DOI: 10.1093/gigascience/giaa081.

I-TASSER gateway: A protein structure and function prediction server powered by XSEDE.

Zheng W, Zhang C, Bell E, Zhang Y Future Gener Comput Syst. 2019; 99:73-85.

PMID: 31427836 PMC: 6699767. DOI: 10.1016/j.future.2019.04.011.

INGA 2.0: improving protein function prediction for the dark proteome.

Piovesan D, Tosatto S Nucleic Acids Res. 2019; 47(W1):W373-W378.

PMID: 31073595 PMC: 6602455. DOI: 10.1093/nar/gkz375.

References

Cuff A, Sillitoe I, Lewis T, Clegg A, Rentzsch R, Furnham N . Extending CATH: increasing coverage of the protein structure universe and linking structure with function. Nucleic Acids Res. 2010; 39(Database issue):D420-6. PMC: 3013636. DOI: 10.1093/nar/gkq1001. View

Wu C, Xiao C, Hou Z, Huang H, Barker W . iProClass: an integrated, comprehensive and annotated protein classification database. Nucleic Acids Res. 2000; 29(1):52-4. PMC: 29833. DOI: 10.1093/nar/29.1.52. View

Kaplan N, Sasson O, Inbar U, Friedlich M, Fromer M, Fleischer H . ProtoNet 4.0: a hierarchical classification of one million protein sequences. Nucleic Acids Res. 2004; 33(Database issue):D216-8. PMC: 539961. DOI: 10.1093/nar/gki007. View

Petryszak R, Kretschmann E, Wieser D, Apweiler R . The predictive power of the CluSTr database. Bioinformatics. 2005; 21(18):3604-9. DOI: 10.1093/bioinformatics/bti542. View

Loewenstein Y, Portugaly E, Fromer M, Linial M . Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space. Bioinformatics. 2008; 24(13):i41-9. PMC: 2718652. DOI: 10.1093/bioinformatics/btn174. View

Kriventseva E, Fleischmann W, Zdobnov E, Apweiler R . CluSTr: a database of clusters of SWISS-PROT+TrEMBL proteins. Nucleic Acids Res. 2000; 29(1):33-6. PMC: 29804. DOI: 10.1093/nar/29.1.33. View

Eddy S . Profile hidden Markov models. Bioinformatics. 1999; 14(9):755-63. DOI: 10.1093/bioinformatics/14.9.755. View

Heger A, Holm L . Picasso: generating a covering set of protein family profiles. Bioinformatics. 2001; 17(3):272-9. DOI: 10.1093/bioinformatics/17.3.272. View

Enright A, Van Dongen S, Ouzounis C . An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002; 30(7):1575-84. PMC: 101833. DOI: 10.1093/nar/30.7.1575. View

10.

Sperisen P, Pagni M . JACOP: a simple and robust method for the automated classification of protein sequences with modular architecture. BMC Bioinformatics. 2005; 6:216. PMC: 1208858. DOI: 10.1186/1471-2105-6-216. View

11.

McGinnis S, Madden T . BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 2004; 32(Web Server issue):W20-5. PMC: 441573. DOI: 10.1093/nar/gkh435. View

12.

Edgar R . MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004; 32(5):1792-7. PMC: 390337. DOI: 10.1093/nar/gkh340. View

13.

Krause A, Stoye J, Vingron M . The SYSTERS protein sequence cluster set. Nucleic Acids Res. 1999; 28(1):270-2. PMC: 102384. DOI: 10.1093/nar/28.1.270. View

14.

Bartoli L, Montanucci L, Fronza R, Martelli P, Fariselli P, Carota L . The bologna annotation resource: a non hierarchical method for the functional and structural annotation of protein sequences relying on a comparative large-scale genome analysis. J Proteome Res. 2009; 8(9):4362-71. DOI: 10.1021/pr900204r. View

15.

Konagurthu A, Whisstock J, Stuckey P, Lesk A . MUSTANG: a multiple structural alignment algorithm. Proteins. 2006; 64(3):559-74. DOI: 10.1002/prot.20921. View