Prediction of Protein Domain Boundaries from Inverse Covariances

Overview

Journal Proteins

Date 2012 Sep 19

PMID 22987736

Citations 8

Authors

Michael I Sadowski

Affiliations

Soon will be listed here.

Abstract

It has been known even since relatively few structures had been solved that longer protein chains often contain multiple domains, which may fold separately and play the role of reusable functional modules found in many contexts. In many structural biology tasks, in particular structure prediction, it is of great use to be able to identify domains within the structure and analyze these regions separately. However, when using sequence data alone this task has proven exceptionally difficult, with relatively little improvement over the naive method of choosing boundaries based on size distributions of observed domains. The recent significant improvement in contact prediction provides a new source of information for domain prediction. We test several methods for using this information including a kernel smoothing-based approach and methods based on building alpha-carbon models and compare performance with a length-based predictor, a homology search method and four published sequence-based predictors: DOMCUT, DomPRO, DLP-SVM, and SCOOBY-DOmain. We show that the kernel-smoothing method is significantly better than the other ab initio predictors when both single-domain and multidomain targets are considered and is not significantly different to the homology-based method. Considering only multidomain targets the kernel-smoothing method outperforms all of the published methods except DLP-SVM. The kernel smoothing method therefore represents a potentially useful improvement to ab initio domain prediction.

Citing Articles

ConPlot: web-based application for the visualization of protein contact maps integrated with other data.

Sanchez Rodriguez F, Mesdaghi S, Simpkin A, Burgos-Marmol J, Murphy D, Uski V Bioinformatics. 2021; 37(17):2763-2765.

PMID: 34499718 PMC: 8428603. DOI: 10.1093/bioinformatics/btab049.

In silico prediction of structure and function for a large family of transmembrane proteins that includes human Tmem41b.

Mesdaghi S, Murphy D, Sanchez Rodriguez F, Burgos-Marmol J, Rigden D F1000Res. 2021; 9:1395.

PMID: 33520197 PMC: 7818093. DOI: 10.12688/f1000research.27676.2.

Co-evolution techniques are reshaping the way we do structural bioinformatics.

De Oliveira S, Deane C F1000Res. 2017; 6:1224.

PMID: 28781768 PMC: 5531156. DOI: 10.12688/f1000research.11543.1.

Applications of contact predictions to structural biology.

Simkovic F, Ovchinnikov S, Baker D, Rigden D IUCrJ. 2017; 4(Pt 3):291-300.

PMID: 28512576 PMC: 5414403. DOI: 10.1107/S2052252517005115.

Residue contacts predicted by evolutionary covariance extend the application of ab initio molecular replacement to larger and more challenging protein folds.

Simkovic F, Thomas J, Keegan R, Winn M, Mayans O, Rigden D IUCrJ. 2016; 3(Pt 4):259-70.

PMID: 27437113 PMC: 4937781. DOI: 10.1107/S2052252516008113.

References

Holland T, Veretnik S, Shindyalov I, Bourne P . Partitioning protein structures into domains: why is it so difficult?. J Mol Biol. 2006; 361(3):562-90. DOI: 10.1016/j.jmb.2006.05.060. View

Vassura M, Margara L, Di Lena P, Medri F, Fariselli P, Casadio R . Reconstruction of 3D structures from protein contact maps. IEEE/ACM Trans Comput Biol Bioinform. 2008; 5(3):357-67. DOI: 10.1109/TCBB.2008.27. View

Friedman J, Hastie T, Tibshirani R . Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2007; 9(3):432-41. PMC: 3019769. DOI: 10.1093/biostatistics/kxm045. View

Eddy S . A new generation of homology search tools based on probabilistic inference. Genome Inform. 2010; 23(1):205-11. View

Hadley C, Jones D . A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. Structure. 1999; 7(9):1099-112. DOI: 10.1016/s0969-2126(99)80177-4. View

Taylor W, Sadowski M . Structural constraints on the covariance matrix derived from multiple aligned protein sequences. PLoS One. 2011; 6(12):e28265. PMC: 3237328. DOI: 10.1371/journal.pone.0028265. View

Taylor W . Multiple sequence threading: an analysis of alignment quality and stability. J Mol Biol. 1997; 269(5):902-43. DOI: 10.1006/jmbi.1997.1008. View

Holm L, Sander C . Parser for protein folding units. Proteins. 1994; 19(3):256-68. DOI: 10.1002/prot.340190309. View

Sadowski M, Maksimiak K, Taylor W . Direct correlation analysis improves fold recognition. Comput Biol Chem. 2011; 35(5):323-32. PMC: 3267019. DOI: 10.1016/j.compbiolchem.2011.08.002. View

10.

Wheelan S, BRYANT S . Domain size distributions can predict domain boundaries. Bioinformatics. 2000; 16(7):613-8. DOI: 10.1093/bioinformatics/16.7.613. View

11.

Islam S, Luo J, STERNBERG M . Identification and analysis of domains in proteins. Protein Eng. 1995; 8(6):513-25. DOI: 10.1093/protein/8.6.513. View

12.

Wetlaufer D . Nucleation, rapid folding, and globular intrachain regions in proteins. Proc Natl Acad Sci U S A. 1973; 70(3):697-701. PMC: 433338. DOI: 10.1073/pnas.70.3.697. View

13.

Jones D, Swindells M . Getting the most from PSI-BLAST. Trends Biochem Sci. 2002; 27(3):161-4. DOI: 10.1016/s0968-0004(01)02039-4. View

14.

Letunic I, Doerks T, Bork P . SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res. 2011; 40(Database issue):D302-5. PMC: 3245027. DOI: 10.1093/nar/gkr931. View

15.

Apic G, Gough J, Teichmann S . Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J Mol Biol. 2001; 310(2):311-25. DOI: 10.1006/jmbi.2001.4776. View

16.

Jones D, Buchan D, Cozzetto D, Pontil M . PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2011; 28(2):184-90. DOI: 10.1093/bioinformatics/btr638. View

17.

Marsden R, McGuffin L, Jones D . Rapid protein domain assignment from amino acid sequence using predicted secondary structure. Protein Sci. 2002; 11(12):2814-24. PMC: 2373756. DOI: 10.1110/ps.0209902. View

18.

Ezkurdia I, Grana O, Izarzugaza J, Tress M . Assessment of domain boundary predictions and the prediction of intramolecular contacts in CASP8. Proteins. 2009; 77 Suppl 9:196-209. DOI: 10.1002/prot.22554. View

19.

Taylor W, Jones D, Sadowski M . Protein topology from predicted residue contacts. Protein Sci. 2011; 21(2):299-305. PMC: 3324774. DOI: 10.1002/pro.2002. View

20.

Weigt M, White R, Szurmant H, Hoch J, Hwa T . Identification of direct residue contacts in protein-protein interaction by message passing. Proc Natl Acad Sci U S A. 2009; 106(1):67-72. PMC: 2629192. DOI: 10.1073/pnas.0805923106. View