Clustering of Cognate Proteins Among Distinct Proteomes Derived from Multiple Links to a Single Seed Sequence

Overview

Journal BMC Bioinformatics

Publisher Biomed Central

Specialty Biology

Date 2008 Mar 7

PMID 18321373

Citations 4

Authors

Adriano Barbosa-Silva

Venkata P Satagopam

Reinhard Schneider

J Miguel Ortega

Affiliations

Soon will be listed here.

Abstract

Background: Modern proteomes evolved by modification of pre-existing ones. It is extremely important to comparative biology that related proteins be identified as members of the same cognate group, since a characterized putative homolog could be used to find clues about the function of uncharacterized proteins from the same group. Typically, databases of related proteins focus on those from completely-sequenced genomes. Unfortunately, relatively few organisms have had their genomes fully sequenced; accordingly, many proteins are ignored by the currently available databases of cognate proteins, despite the high amount of important genes that are functionally described only for these incomplete proteomes.

Results: We have developed a method to cluster cognate proteins from multiple organisms beginning with only one sequence, through connectivity saturation with that Seed sequence. We show that the generated clusters are in agreement with some other approaches based on full genome comparison.

Conclusion: The method produced results that are as reliable as those produced by conventional clustering approaches. Generating clusters based only on individual proteins of interest is less time consuming than generating clusters for whole proteomes.

Citing Articles

BOWS (bioinformatics open web services) to centralize bioinformatics tools in web services.

Velloso H, Vialle R, Ortega J BMC Res Notes. 2015; 8:206.

PMID: 26032494 PMC: 4467627. DOI: 10.1186/s13104-015-1190-0.

Preimplantation development regulatory pathway construction through a text-mining approach.

Donnard E, Barbosa-Silva A, Guedes R, Fernandes G, Velloso H, Kohn M BMC Genomics. 2012; 12 Suppl 4:S3.

PMID: 22369103 PMC: 3287586. DOI: 10.1186/1471-2164-12-S4-S3.

Amino acids biosynthesis and nitrogen assimilation pathways: a great genomic deletion during eukaryotes evolution.

Guedes R, Prosdocimi F, Fernandes G, Moura L, Ribeiro H, Ortega J BMC Genomics. 2012; 12 Suppl 4:S2.

PMID: 22369087 PMC: 3287585. DOI: 10.1186/1471-2164-12-S4-S2.

Filling the gap between biology and computer science.

Aguilar-Ruiz J, Moore J, Ritchie M BioData Min. 2008; 1(1):1.

PMID: 18822148 PMC: 2547862. DOI: 10.1186/1756-0381-1-1.

References

Sankar N, Machado J, Abdulla P, Hilliker A, Coe I . Comparative genomic analysis of equilibrative nucleoside transporters suggests conserved protein structure despite limited sequence identity. Nucleic Acids Res. 2002; 30(20):4339-50. PMC: 137128. DOI: 10.1093/nar/gkf564. View

Tatusov R, Fedorova N, Jackson J, Jacobs A, Kiryutin B, Koonin E . The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003; 4:41. PMC: 222959. DOI: 10.1186/1471-2105-4-41. View

Miotto O, Tan T, Brusic V . Supporting the curation of biological databases with reusable text mining. Genome Inform. 2006; 16(2):32-44. View

Alexeyenko A, Tamas I, Liu G, Sonnhammer E . Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics. 2006; 22(14):e9-15. DOI: 10.1093/bioinformatics/btl213. View

OBrien K, Remm M, Sonnhammer E . Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 2004; 33(Database issue):D476-80. PMC: 540061. DOI: 10.1093/nar/gki107. View

Lee Y, Sultana R, Pertea G, Cho J, Karamycheva S, Tsai J . Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA). Genome Res. 2002; 12(3):493-502. PMC: 155294. DOI: 10.1101/gr.212002. View

Mao X, Cai T, Olyarchuk J, Wei L . Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary. Bioinformatics. 2005; 21(19):3787-93. DOI: 10.1093/bioinformatics/bti430. View

Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W . Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25(17):3389-402. PMC: 146917. DOI: 10.1093/nar/25.17.3389. View

Chen F, Mackey A, Stoeckert Jr C, Roos D . OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 2005; 34(Database issue):D363-8. PMC: 1347485. DOI: 10.1093/nar/gkj123. View

10.

Camon E, Barrell D, Lee V, Dimmer E, Apweiler R . The Gene Ontology Annotation (GOA) Database--an integrated resource of GO annotations to the UniProt Knowledgebase. In Silico Biol. 2004; 4(1):5-6. View

11.

Remm M, Storm C, Sonnhammer E . Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 2001; 314(5):1041-52. DOI: 10.1006/jmbi.2000.5197. View

12.

Sonnhammer E, Koonin E . Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet. 2002; 18(12):619-20. DOI: 10.1016/s0168-9525(02)02793-2. View

13.

Tatusov R, Natale D, Garkavtsev I, Tatusova T, Shankavaram U, Rao B . The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 2000; 29(1):22-8. PMC: 29819. DOI: 10.1093/nar/29.1.22. View

14.

Apweiler R, Bairoch A, Wu C, Barker W, Boeckmann B, Ferro S . UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 2003; 32(Database issue):D115-9. PMC: 308865. DOI: 10.1093/nar/gkh131. View

15.

Remm M, Sonnhammer E . Classification of transmembrane protein families in the Caenorhabditis elegans genome and identification of human orthologs. Genome Res. 2000; 10(11):1679-89. PMC: 310950. DOI: 10.1101/gr.gr-1491r. View

16.

Li L, Stoeckert Jr C, Roos D . OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003; 13(9):2178-89. PMC: 403725. DOI: 10.1101/gr.1224503. View