BAliBASE (Benchmark Alignment DataBASE): Enhancements for Repeats, Transmembrane Sequences and Circular Permutations

Overview

Journal Nucleic Acids Res

Publisher Oxford University Press

Specialty Biochemistry

Date 2000 Jan 11

PMID 11125126

Citations 45

Authors

A Bahr

J D Thompson

J C Thierry

O Poch

Affiliations

Soon will be listed here.

Abstract

BAliBASE is specifically designed to serve as an evaluation resource to address all the problems encountered when aligning complete sequences. The database contains high quality, manually constructed multiple sequence alignments together with detailed annotations. The alignments are all based on three-dimensional structural superpositions, with the exception of the transmembrane sequences. The first release provided sets of reference alignments dealing with the problems of high variability, unequal repartition and large N/C-terminal extensions and internal insertions. Here we describe version 2.0 of the database, which incorporates three new reference sets of alignments containing structural repeats, trans-membrane sequences and circular permutations to evaluate the accuracy of detection/prediction and alignment of these complex sequences. BAliBASE can be viewed at the web site http://www-igbmc.u-strasbg. fr/BioInfo/BAliBASE2/index.html or can be downloaded from ftp://ftp-igbmc.u-strasbg.fr/pub/BAliBASE2 /.

Citing Articles

Sequence Flow: interactive web application for visualizing partial order alignments.

Zdablasz K, Lisiecka A, Dojer N BMC Genomics. 2024; 25(1):973.

PMID: 39415087 PMC: 11483981. DOI: 10.1186/s12864-024-10886-y.

Embedding-based alignment: combining protein language models with dynamic programming alignment to detect structural similarities in the twilight-zone.

Pantolini L, Studer G, Pereira J, Durairaj J, Tauriello G, Schwede T Bioinformatics. 2024; 40(1).

PMID: 38175775 PMC: 10792726. DOI: 10.1093/bioinformatics/btad786.

DCAlign v1.0: aligning biological sequences using co-evolution models and informed priors.

Muntoni A, Pagnani A Bioinformatics. 2023; 39(9).

PMID: 37647658 PMC: 10491954. DOI: 10.1093/bioinformatics/btad537.

Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families.

Hubley R, Wheeler T, Smit A NAR Genom Bioinform. 2022; 4(2):lqac040.

PMID: 35591887 PMC: 9112768. DOI: 10.1093/nargab/lqac040.

Application of the MAHDS Method for Multiple Alignment of Highly Diverged Amino Acid Sequences.

Kostenko D, Korotkov E Int J Mol Sci. 2022; 23(7).

PMID: 35409125 PMC: 8998981. DOI: 10.3390/ijms23073764.

References

Bateman A, Birney E, Durbin R, Eddy S, Howe K, Sonnhammer E . The Pfam protein families database. Nucleic Acids Res. 1999; 28(1):263-6. PMC: 102420. DOI: 10.1093/nar/28.1.263. View

Gromiha M . A simple method for predicting transmembrane alpha helices with better accuracy. Protein Eng. 1999; 12(7):557-61. DOI: 10.1093/protein/12.7.557. View

Andrade M, Ponting C, Gibson T, Bork P . Homology-based method for identification of protein repeats using statistical significance estimates. J Mol Biol. 2000; 298(3):521-37. DOI: 10.1006/jmbi.2000.3684. View

Lio P, Vannucci M . Wavelet change-point prediction of transmembrane proteins. Bioinformatics. 2000; 16(4):376-82. DOI: 10.1093/bioinformatics/16.4.376. View

Thompson J, Plewniak F, Thierry J, Poch O . DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches. Nucleic Acids Res. 2000; 28(15):2919-26. PMC: 102675. DOI: 10.1093/nar/28.15.2919. View

Gribskov M, McLachlan A, Eisenberg D . Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987; 84(13):4355-8. PMC: 305087. DOI: 10.1073/pnas.84.13.4355. View

Heringa J, Argos P . A method to recognize distant repeats in protein sequences. Proteins. 1993; 17(4):391-41. DOI: 10.1002/prot.340170407. View

Kleywegt G, Jones T . Where freedom is given, liberties are taken. Structure. 1995; 3(6):535-40. DOI: 10.1016/s0969-2126(01)00187-3. View

Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W . Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25(17):3389-402. PMC: 146917. DOI: 10.1093/nar/25.17.3389. View

10.

Cserzo M, Wallin E, Simon I, von Heijne G, Elofsson A . Prediction of transmembrane alpha-helices in prokaryotic membrane proteins: the dense alignment surface method. Protein Eng. 1997; 10(6):673-6. DOI: 10.1093/protein/10.6.673. View

11.

Bornberg-Bauer E, Rivals E, Vingron M . Computational approaches to identify leucine zippers. Nucleic Acids Res. 1998; 26(11):2740-6. PMC: 147599. DOI: 10.1093/nar/26.11.2740. View

12.

Hirokawa T, Mitaku S . SOSUI: classification and secondary structure prediction system for membrane proteins. Bioinformatics. 1998; 14(4):378-9. DOI: 10.1093/bioinformatics/14.4.378. View

13.

Kihara D, Shimizu T, Kanehisa M . Prediction of membrane proteins based on classification of transmembrane segments. Protein Eng. 1999; 11(11):961-70. DOI: 10.1093/protein/11.11.961. View

14.

Thompson J, Plewniak F, Poch O . BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics. 1999; 15(1):87-8. DOI: 10.1093/bioinformatics/15.1.87. View

15.

Pasquier C, Promponas V, Palaios G, Hamodrakas J, Hamodrakas S . A novel method for predicting transmembrane segments in proteins based on a statistical analysis of the SwissProt database: the PRED-TMR algorithm. Protein Eng. 1999; 12(5):381-5. DOI: 10.1093/protein/12.5.381. View

16.

Thompson J, Plewniak F, Poch O . A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 1999; 27(13):2682-90. PMC: 148477. DOI: 10.1093/nar/27.13.2682. View

17.

Pellegrini M, Marcotte E, Yeates T . A fast algorithm for genome-wide analysis of proteins with repeated sequences. Proteins. 1999; 35(4):440-6. View

18.

Uliel S, Fliess A, Amir A, Unger R . A simple algorithm for detecting circular permutations in proteins. Bioinformatics. 2000; 15(11):930-6. DOI: 10.1093/bioinformatics/15.11.930. View