» Articles » PMID: 21464511

Touring Protein Space with Matt

Overview
Specialty Biology
Date 2011 Apr 6
PMID 21464511
Citations 10
Authors
Affiliations
Soon will be listed here.
Abstract

Using the Matt structure alignment program, we take a tour of protein space, producing a hierarchical clustering scheme that divides protein structural domains into clusters based on geometric dissimilarity. While it was known that purely structural, geometric, distance-based measures of structural similarity, such as Dali/FSSP, could largely replicate hand-curated schemes such as SCOP at the family level, it was an open question as to whether any such scheme could approximate SCOP at the more distant superfamily and fold levels. We partially answer this question in the affirmative, by designing a clustering scheme based on Matt that approximately matches SCOP at the superfamily level, and demonstrates qualitative differences in performance between Matt and DaliLite. Implications for the debate over the organization of protein fold space are discussed. Based on our clustering of protein space, we introduce the Mattbench benchmark set, a new collection of structural alignments useful for testing sequence aligners on more distantly homologous proteins.

Citing Articles

A review of visualisations of protein fold networks and their relationship with sequence and function.

Sykes J, Holland B, Charleston M Biol Rev Camb Philos Soc. 2022; 98(1):243-262.

PMID: 36210328 PMC: 10092621. DOI: 10.1111/brv.12905.


Correlations between alignment gaps and nucleotide substitution or amino acid replacement.

Seo T, Redelings B, Thorne J Proc Natl Acad Sci U S A. 2022; 119(34):e2204435119.

PMID: 35972964 PMC: 9407537. DOI: 10.1073/pnas.2204435119.


Bridging the gaps in statistical models of protein alignment.

Sumanaweera D, Allison L, Konagurthu A Bioinformatics. 2022; 38(Suppl 1):i229-i237.

PMID: 35758809 PMC: 9235498. DOI: 10.1093/bioinformatics/btac246.


MAFFT-DASH: integrated protein sequence and structural alignment.

Rozewicki J, Li S, Amada K, Standley D, Katoh K Nucleic Acids Res. 2019; 47(W1):W5-W10.

PMID: 31062021 PMC: 6602451. DOI: 10.1093/nar/gkz342.


Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets.

Nute M, Saleh E, Warnow T Syst Biol. 2018; 68(3):396-411.

PMID: 30329135 PMC: 6472439. DOI: 10.1093/sysbio/syy068.


References
1.
Murzin A, Brenner S, Hubbard T, Chothia C . SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995; 247(4):536-40. DOI: 10.1006/jmbi.1995.0159. View

2.
Holland T, Veretnik S, Shindyalov I, Bourne P . Partitioning protein structures into domains: why is it so difficult?. J Mol Biol. 2006; 361(3):562-90. DOI: 10.1016/j.jmb.2006.05.060. View

3.
Madej T, Gibrat J, BRYANT S . Threading a database of protein cores. Proteins. 1995; 23(3):356-69. DOI: 10.1002/prot.340230309. View

4.
Day R, Beck D, Armen R, Daggett V . A consensus view of fold space: combining SCOP, CATH, and the Dali Domain Dictionary. Protein Sci. 2003; 12(10):2150-60. PMC: 2366924. DOI: 10.1110/ps.0306803. View

5.
Rost B . Did evolution leap to create the protein universe?. Curr Opin Struct Biol. 2002; 12(3):409-16. DOI: 10.1016/s0959-440x(02)00337-8. View