» Articles » PMID: 11591640

Annotation Transfer for Genomics: Measuring Functional Divergence in Multi-domain Proteins

Overview
Journal Genome Res
Specialty Genetics
Date 2001 Oct 10
PMID 11591640
Citations 52
Authors
Affiliations
Soon will be listed here.
Abstract

Annotation transfer is a principal process in genome annotation. It involves "transferring" structural and functional annotation to uncharacterized open reading frames (ORFs) in a newly completed genome from experimentally characterized proteins similar in sequence. To prevent errors in genome annotation, it is important that this process be robust and statistically well-characterized, especially with regard to how it depends on the degree of sequence similarity. Previously, we and others have analyzed annotation transfer in single-domain proteins. Multi-domain proteins, which make up the bulk of the ORFs in eukaryotic genomes, present more complex issues in functional conservation. Here we present a large-scale survey of annotation transfer in these proteins, using scop superfamilies to define domain folds and a thesaurus based on SWISS-PROT keywords to define functional categories. Our survey reveals that multi-domain proteins have significantly less functional conservation than single-domain ones, except when they share the exact same combination of domain folds. In particular, we find that for multi-domain proteins, approximate function can be accurately transferred with only 35% certainty for pairs of proteins sharing one structural superfamily. In contrast, this value is 67% for pairs of single-domain proteins sharing the same structural superfamily. On the other hand, if two multi-domain proteins contain the same combination of two structural superfamilies the probability of their sharing the same function increases to 80% in the case of complete coverage along the full length of both proteins, this value increases further to > 90%. Moreover, we found that only 70 of the current total of 455 structural superfamilies are found in both single and multi-domain proteins and only 14 of these were associated with the same function in both categories of proteins. We also investigated the degree to which function could be transferred between pairs of multi-domain proteins with respect to the degree of sequence similarity between them, finding that functional divergence at a given amount of sequence similarity is always about two-fold greater for pairs of multi-domain proteins (sharing similarity over a single domain) in comparison to pairs of single-domain ones, though the overall shape of the relationship is quite similar. Further information is available at http://partslist.org/func or http://bioinfo.mbb.yale.edu/partslist/func.

Citing Articles

AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data.

Maia G, Benetti Filho V, Kawagoe E, Soratto T, Moreira R, Grisard E Front Genet. 2022; 13:1020100.

PMID: 36482896 PMC: 9723129. DOI: 10.3389/fgene.2022.1020100.


Biogenesis, conservation, and function of miRNA in liverworts.

Pietrykowska H, Sierocka I, Zielezinski A, Alisha A, Carrasco-Sanchez J, Jarmolowski A J Exp Bot. 2022; 73(13):4528-4545.

PMID: 35275209 PMC: 9291395. DOI: 10.1093/jxb/erac098.


Pathway-specific protein domains are predictive for human diseases.

Shim J, Kim J, Shin J, Lee J, Lee I PLoS Comput Biol. 2019; 15(5):e1007052.

PMID: 31075101 PMC: 6530867. DOI: 10.1371/journal.pcbi.1007052.


SpidermiR: An R/Bioconductor Package for Integrative Analysis with miRNA Data.

Cava C, Colaprico A, Bertoli G, Graudenzi A, C Silva T, Olsen C Int J Mol Sci. 2017; 18(2).

PMID: 28134831 PMC: 5343810. DOI: 10.3390/ijms18020274.


Structural and Functional Characterization of a Ruminal β-Glycosidase Defines a Novel Subfamily of Glycoside Hydrolase Family 3 with Permuted Domain Topology.

Ramirez-Escudero M, Del Pozo M, Marin-Navarro J, Gonzalez B, Golyshin P, Polaina J J Biol Chem. 2016; 291(46):24200-24214.

PMID: 27679487 PMC: 5104943. DOI: 10.1074/jbc.M116.747527.


References
1.
Eisenstein E, Gilliland G, Herzberg O, Moult J, Orban J, POLJAK R . Biological function made crystal clear - annotation of hypothetical proteins via structural genomics. Curr Opin Biotechnol. 2000; 11(1):25-30. DOI: 10.1016/s0958-1669(99)00063-4. View

2.
Bairoch A . The ENZYME database in 2000. Nucleic Acids Res. 1999; 28(1):304-5. PMC: 102465. DOI: 10.1093/nar/28.1.304. View

3.
Wilson C, Kreychman J, Gerstein M . Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J Mol Biol. 2000; 297(1):233-49. DOI: 10.1006/jmbi.2000.3550. View

4.
Stawiski E, Baucom A, Lohr S, Gregoret L . Predicting protein function from structure: unique structural features of proteases. Proc Natl Acad Sci U S A. 2000; 97(8):3954-8. PMC: 18123. DOI: 10.1073/pnas.070548997. View

5.
Lin J, Gerstein M . Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. Genome Res. 2000; 10(6):808-18. PMC: 310900. DOI: 10.1101/gr.10.6.808. View