» Articles » PMID: 23078280

Evolutionarily Consistent Families in SCOP: Sequence, Structure and Function

Overview
Journal BMC Struct Biol
Publisher Biomed Central
Date 2012 Oct 20
PMID 23078280
Citations 8
Authors
Affiliations
Soon will be listed here.
Abstract

Background: SCOP is a hierarchical domain classification system for proteins of known structure. The superfamily level has a clear definition: Protein domains belong to the same superfamily if there is structural, functional and sequence evidence for a common evolutionary ancestor. Superfamilies are sub-classified into families, however, there is not such a clear basis for the family level groupings. Do SCOP families group together domains with sequence similarity, do they group domains with similar structure or by common function? It is these questions we answer, but most importantly, whether each family represents a distinct phylogenetic group within a superfamily.

Results: Several phylogenetic trees were generated for each superfamily: one derived from a multiple sequence alignment, one based on structural distances, and the final two from presence/absence of GO terms or EC numbers assigned to domains. The topologies of the resulting trees and confidence values were compared to the SCOP family classification.

Conclusions: We show that SCOP family groupings are evolutionarily consistent to a very high degree with respect to classical sequence phylogenetics. The trees built from (automatically generated) structural distances correlate well, but are not always consistent with SCOP (hand annotated) groupings. Trees derived from functional data are less consistent with the family level than those from structure or sequence, though the majority still agree. Much of GO and EC annotation applies directly to one family or subset of the family; relatively few terms apply at the superfamily level. Maximum sequence diversity within a family is on average 22% but close to zero for superfamilies.

Citing Articles

FunFam protein families improve residue level molecular function prediction.

Scheibenreif L, Littmann M, Orengo C, Rost B BMC Bioinformatics. 2019; 20(1):400.

PMID: 31319797 PMC: 6639920. DOI: 10.1186/s12859-019-2988-x.


The SUPERFAMILY 2.0 database: a significant proteome update and a new webserver.

Pandurangan A, Stahlhacke J, Oates M, Ben Smithers , Gough J Nucleic Acids Res. 2018; 47(D1):D490-D494.

PMID: 30445555 PMC: 6324026. DOI: 10.1093/nar/gky1130.


What is an archaeon and are the Archaea really unique?.

Harish A PeerJ. 2018; 6:e5770.

PMID: 30357005 PMC: 6196074. DOI: 10.7717/peerj.5770.


A Subset of Ubiquitin-Conjugating Enzymes Is Essential for Plant Immunity.

Zhou B, Mural R, Chen X, Oates M, Connor R, Martin G Plant Physiol. 2016; 173(2):1371-1390.

PMID: 27909045 PMC: 5291023. DOI: 10.1104/pp.16.01190.


Functional classification of CATH superfamilies: a domain-based approach for protein function annotation.

Das S, Lee D, Sillitoe I, Dawson N, Lees J, Orengo C Bioinformatics. 2015; 31(21):3460-7.

PMID: 26139634 PMC: 4612221. DOI: 10.1093/bioinformatics/btv398.


References
1.
Brenner S, Koehl P, Levitt M . The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res. 1999; 28(1):254-6. PMC: 102434. DOI: 10.1093/nar/28.1.254. View

2.
Finn R, Mistry J, Tate J, Coggill P, Heger A, Pollington J . The Pfam protein families database. Nucleic Acids Res. 2009; 38(Database issue):D211-22. PMC: 2808889. DOI: 10.1093/nar/gkp985. View

3.
Hill D, Davis A, Richardson J, Corradi J, Ringwald M, Eppig J . Program description: Strategies for biological annotation of mammalian systems: implementing gene ontologies in mouse genome informatics. Genomics. 2001; 74(1):121-8. DOI: 10.1006/geno.2001.6513. View

4.
Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J . Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000; 25(1):25-9. PMC: 3037419. DOI: 10.1038/75556. View

5.
Gough J, Chothia C . The linked conservation of structure and function in a family of high diversity: the monomeric cupredoxins. Structure. 2004; 12(6):917-25. DOI: 10.1016/j.str.2004.03.029. View