» Articles » PMID: 28771374

Initial Cluster Analysis

Overview
Journal J Comput Biol
Date 2017 Aug 4
PMID 28771374
Citations 4
Authors
Affiliations
Soon will be listed here.
Abstract

We study a simple abstract problem motivated by a variety of applications in protein sequence analysis. Consider a string of 0s and 1s of length L, and containing D 1s. If we believe that some or all of the 1s may be clustered near the start of the sequence, which subset is the most significantly so clustered, and how significant is this clustering? We approach this question using the minimum description length principle and illustrate its application by analyzing residues that distinguish translational initiation and elongation factor guanosine triphosphatases (GTPases) from other P-loop GTPases. Within a structure of yeast elongation factor 1[Formula: see text], these residues form a significant cluster centered on a region implicated in guanine nucleotide exchange. Various biomedical questions may be cast as the abstract problem considered here.

Citing Articles

SPARC: Structural properties associated with residue constraints.

Neuwald A, Yang H, Nixon B Comput Struct Biotechnol J. 2022; 20:1702-1715.

PMID: 35495120 PMC: 9020082. DOI: 10.1016/j.csbj.2022.04.005.


Identifying Function Determining Residues in Neuroimmune Semaphorin 4A.

Chapoval S, Lee M, Lemmer A, Ajayi O, Qi X, Neuwald A Int J Mol Sci. 2022; 23(6).

PMID: 35328445 PMC: 8953949. DOI: 10.3390/ijms23063024.


Statistical investigations of protein residue direct couplings.

Neuwald A, Altschul S PLoS Comput Biol. 2019; 14(12):e1006237.

PMID: 30596639 PMC: 6329532. DOI: 10.1371/journal.pcbi.1006237.


Inferring joint sequence-structural determinants of protein functional specificity.

Neuwald A, Aravind L, Altschul S Elife. 2018; 7.

PMID: 29336305 PMC: 5770160. DOI: 10.7554/eLife.29880.

References
1.
Neuwald A, Altschul S . Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations. PLoS Comput Biol. 2016; 12(12):e1005294. PMC: 5225019. DOI: 10.1371/journal.pcbi.1005294. View

2.
Fischer J, Mayer C, Soding J . Prediction of protein functional residues from sequence by probability density estimation. Bioinformatics. 2008; 24(5):613-20. DOI: 10.1093/bioinformatics/btm626. View

3.
Karlin S, Zhu Z . Characterizations of diverse residue clusters in protein three-dimensional structures. Proc Natl Acad Sci U S A. 1996; 93(16):8344-9. PMC: 38673. DOI: 10.1073/pnas.93.16.8344. View

4.
Leipe D, Wolf Y, Koonin E, Aravind L . Classification and evolution of P-loop GTPases and related ATPases. J Mol Biol. 2002; 317(1):41-72. DOI: 10.1006/jmbi.2001.5378. View

5.
Jeffreys H . An invariant form for the prior probability in estimation problems. Proc R Soc Lond A Math Phys Sci. 2010; 186(1007):453-61. DOI: 10.1098/rspa.1946.0056. View