» Articles » PMID: 9600892

A Unified Statistical Framework for Sequence Comparison and Structure Comparison

Overview
Specialty Science
Date 1998 May 30
PMID 9600892
Citations 97
Authors
Affiliations
Soon will be listed here.
Abstract

We present an approach for assessing the significance of sequence and structure comparisons by using nearly identical statistical formalisms for both sequence and structure. Doing so involves an all-vs.-all comparison of protein domains [taken here from the Structural Classification of Proteins (scop) database] and then fitting a simple distribution function to the observed scores. By using this distribution, we can attach a statistical significance to each comparison score in the form of a P value, the probability that a better score would occur by chance. As expected, we find that the scores for sequence matching follow an extreme-value distribution. The agreement, moreover, between the P values that we derive from this distribution and those reported by standard programs (e.g., BLAST and FASTA validates our approach. Structure comparison scores also follow an extreme-value distribution when the statistics are expressed in terms of a structural alignment score (essentially the sum of reciprocated distances between aligned atoms minus gap penalties). We find that the traditional metric of structural similarity, the rms deviation in atom positions after fitting aligned atoms, follows a different distribution of scores and does not perform as well as the structural alignment score. Comparison of the sequence and structure statistics for pairs of proteins known to be related distantly shows that structural comparison is able to detect approximately twice as many distant relationships as sequence comparison at the same error rate. The comparison also indicates that there are very few pairs with significant similarity in terms of sequence but not structure whereas many pairs have significant similarity in terms of structure but not sequence.

Citing Articles

How the technologies behind self-driving cars, social networks, ChatGPT, and DALL-E2 are changing structural biology.

Bochtler M Bioessays. 2024; 47(1):e2400155.

PMID: 39404756 PMC: 11662154. DOI: 10.1002/bies.202400155.


LoCoHD: a metric for comparing local environments of proteins.

Fazekas Z, Menyhard D, Perczel A Nat Commun. 2024; 15(1):4029.

PMID: 38740745 PMC: 11091161. DOI: 10.1038/s41467-024-48225-0.


Sequence-structure-function relationships in the microbial protein universe.

Leman J, Szczerbiak P, Renfrew P, Gligorijevic V, Berenberg D, Vatanen T Nat Commun. 2023; 14(1):2351.

PMID: 37100781 PMC: 10133388. DOI: 10.1038/s41467-023-37896-w.


InterPepRank: Assessment of Docked Peptide Conformations by a Deep Graph Network.

Johansson-Akhe I, Mirabello C, Wallner B Front Bioinform. 2022; 1:763102.

PMID: 36303778 PMC: 9581042. DOI: 10.3389/fbinf.2021.763102.


Estimating the Similarity between Protein Pockets.

Eguida M, Rognan D Int J Mol Sci. 2022; 23(20).

PMID: 36293316 PMC: 9604425. DOI: 10.3390/ijms232012462.


References
1.
Murzin A, Brenner S, Hubbard T, Chothia C . SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995; 247(4):536-40. DOI: 10.1006/jmbi.1995.0159. View

2.
Gerstein M, Altman R . Using a measure of structural variation to define a core for the globins. Comput Appl Biosci. 1995; 11(6):633-44. DOI: 10.1093/bioinformatics/11.6.633. View

3.
Levitt M, Chothia C . Structural patterns in globular proteins. Nature. 1976; 261(5561):552-8. DOI: 10.1038/261552a0. View

4.
Lipman D, Pearson W . Rapid and sensitive protein similarity searches. Science. 1985; 227(4693):1435-41. DOI: 10.1126/science.2983426. View

5.
Gerstein M, Lesk A, Chothia C . Structural mechanisms for domain movements in proteins. Biochemistry. 1994; 33(22):6739-49. DOI: 10.1021/bi00188a001. View