» Articles » PMID: 17620139

Quantitative Assessment of Relationship Between Sequence Similarity and Function Similarity

Overview
Journal BMC Genomics
Publisher Biomed Central
Specialty Genetics
Date 2007 Jul 11
PMID 17620139
Citations 45
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Comparative sequence analysis is considered as the first step towards annotating new proteins in genome annotation. However, sequence comparison may lead to creation and propagation of function assignment errors. Thus, it is important to perform a thorough analysis for the quality of sequence-based function assignment using large-scale data in a systematic way.

Results: We present an analysis of the relationship between sequence similarity and function similarity for the proteins in four model organisms, i.e., Arabidopsis thaliana, Saccharomyces cerevisiae, Caenorrhabditis elegans, and Drosophila melanogaster. Using a measure of functional similarity based on the three categories of Gene Ontology (GO) classifications (biological process, molecular function, and cellular component), we quantified the correlation between functional similarity and sequence similarity measured by sequence identity or statistical significance of the alignment and compared such a correlation against randomly chosen protein pairs.

Conclusion: Various sequence-function relationships were identified from BLAST versus PSI-BLAST, sequence identity versus Expectation Value, GO indices versus semantic similarity approaches, and within genome versus between genome comparisons, for the three GO categories. Our study provides a benchmark to estimate the confidence in assignment of functions purely based on sequence similarity.

Citing Articles

Cognitive Impact of Neurotropic Pathogens: Investigating Molecular Mimicry through Computational Methods.

Buttiker P, Boukherissa A, Weissenberger S, Ptacek R, Anders M, Raboch J Cell Mol Neurobiol. 2024; 44(1):72.

PMID: 39467848 PMC: 11519248. DOI: 10.1007/s10571-024-01509-x.


Multi-omics profiling and experimental verification of tertiary lymphoid structure-related genes: molecular subgroups, immune infiltration, and prognostic implications in lung adenocarcinoma.

Wu S, Pan J, Pan Q, Zeng L, Liang R, Li Y Front Immunol. 2024; 15:1453220.

PMID: 39364403 PMC: 11446812. DOI: 10.3389/fimmu.2024.1453220.


Comparative proteomic profiling of the ovine and human PBMC inflammatory response.

Elkhamary A, Gerner I, Bileck A, Oreff G, Gerner C, Jenner F Sci Rep. 2024; 14(1):14939.

PMID: 38942936 PMC: 11213919. DOI: 10.1038/s41598-024-66059-0.


Genome mining of : targeting SufD as a novel drug candidate through characterization and inhibitor screening.

Gorityala N, Baidya A, Sagurthi S Front Microbiol. 2024; 15:1369645.

PMID: 38686111 PMC: 11057465. DOI: 10.3389/fmicb.2024.1369645.


Comparative genomics reveals insight into the phylogeny and habitat adaptation of novel species, an endophytic actinomycete associated with scab lesions on potato tubers.

Wannawong T, Mhuantong W, Macharoen P, Niemhom N, Sitdhipol J, Chaiyawan N Front Plant Sci. 2024; 15:1346574.

PMID: 38601305 PMC: 11004387. DOI: 10.3389/fpls.2024.1346574.


References
1.
Shah I, Hunter L . Predicting enzyme function from sequence: a systematic appraisal. Proc Int Conf Intell Syst Mol Biol. 1997; 5:276-83. PMC: 2709532. View

2.
Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W . Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25(17):3389-402. PMC: 146917. DOI: 10.1093/nar/25.17.3389. View

3.
Andrade M, Sander C . Bioinformatics: from genome data to biological knowledge. Curr Opin Biotechnol. 1998; 8(6):675-83. DOI: 10.1016/s0958-1669(97)80118-8. View

4.
Bork P, Koonin E . Predicting functions from protein sequences--where are the bottlenecks?. Nat Genet. 1998; 18(4):313-8. DOI: 10.1038/ng0498-313. View

5.
Levitt M, Gerstein M . A unified statistical framework for sequence comparison and structure comparison. Proc Natl Acad Sci U S A. 1998; 95(11):5913-20. PMC: 34495. DOI: 10.1073/pnas.95.11.5913. View