FUpred: Detecting Protein Domains Through Deep-learning-based Contact Map Prediction

Overview

Journal Bioinformatics

Publisher Oxford University Press

Specialty Biology

Date 2020 Apr 1

PMID 32227201

Citations 29

Authors

Wei Zheng

Xiaogen Zhou

Qiqige Wuyun

Robin Pearce

Yang Li

Yang Zhang

Affiliations

Soon will be listed here.

Abstract

Motivation: Protein domains are subunits that can fold and function independently. Correct domain boundary assignment is thus a critical step toward accurate protein structure and function analyses. There is, however, no efficient algorithm available for accurate domain prediction from sequence. The problem is particularly challenging for proteins with discontinuous domains, which consist of domain segments that are separated along the sequence.

Results: We developed a new algorithm, FUpred, which predicts protein domain boundaries utilizing contact maps created by deep residual neural networks coupled with coevolutionary precision matrices. The core idea of the algorithm is to retrieve domain boundary locations by maximizing the number of intra-domain contacts, while minimizing the number of inter-domain contacts from the contact maps. FUpred was tested on a large-scale dataset consisting of 2549 proteins and generated correct single- and multi-domain classifications with a Matthew's correlation coefficient of 0.799, which was 19.1% (or 5.3%) higher than the best machine learning (or threading)-based method. For proteins with discontinuous domains, the domain boundary detection and normalized domain overlapping scores of FUpred were 0.788 and 0.521, respectively, which were 17.3% and 23.8% higher than the best control method. The results demonstrate a new avenue to accurately detect domain composition from sequence alone, especially for discontinuous, multi-domain proteins.

Availability And Implementation: https://zhanglab.ccmb.med.umich.edu/FUpred.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Treponema denticola major surface protein (Msp): a key player in periodontal pathogenicity and immune evasion.

Zhao Y, Chen J, Tian Y, Huang H, Zhao F, Deng X Arch Microbiol. 2025; 207(2):36.

PMID: 39825920 DOI: 10.1007/s00203-024-04223-w.

Machine Learning Techniques to Infer Protein Structure and Function from Sequences: A Comprehensive Review.

Srivastava G, Liu M, Ni X, Pu L, Brylinski M Methods Mol Biol. 2024; 2867:79-104.

PMID: 39576576 DOI: 10.1007/978-1-0716-4196-5_5.

Hierarchical Analysis of Protein Structures: From Secondary Structures to Protein Units and Domains.

Perin C, Cretin G, Gelly J Methods Mol Biol. 2024; 2870:357-370.

PMID: 39543044 DOI: 10.1007/978-1-0716-4213-9_18.

Protein domain embeddings for fast and accurate similarity search.

Iovino B, Tang H, Ye Y Genome Res. 2024; 34(9):1434-1444.

PMID: 39237301 PMC: 11529836. DOI: 10.1101/gr.279127.124.

Chainsaw: protein domain segmentation with fully convolutional neural networks.

Wells J, Hawkins-Hooker A, Bordin N, Sillitoe I, Paige B, Orengo C Bioinformatics. 2024; 40(5).

PMID: 38718225 PMC: 11256964. DOI: 10.1093/bioinformatics/btae296.

References

Soding J . Protein homology detection by HMM-HMM comparison. Bioinformatics. 2004; 21(7):951-60. DOI: 10.1093/bioinformatics/bti125. View

Tai C, Lee W, Vincent J, Lee B . Evaluation of domain prediction in CASP6. Proteins. 2005; 61 Suppl 7:183-192. DOI: 10.1002/prot.20736. View

Chandonia J, Fox N, Brenner S . SCOPe: Manual Curation and Artifact Removal in the Structural Classification of Proteins - extended Database. J Mol Biol. 2016; 429(3):348-355. PMC: 5272801. DOI: 10.1016/j.jmb.2016.11.023. View

Eickholt J, Deng X, Cheng J . DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning. BMC Bioinformatics. 2011; 12:43. PMC: 3036623. DOI: 10.1186/1471-2105-12-43. View

Mistry J, Bateman A, Finn R . Predicting active site residue annotations in the Pfam database. BMC Bioinformatics. 2007; 8:298. PMC: 2025603. DOI: 10.1186/1471-2105-8-298. View

Cheng J . DOMAC: an accurate, hybrid protein domain prediction server. Nucleic Acids Res. 2007; 35(Web Server issue):W354-6. PMC: 1933197. DOI: 10.1093/nar/gkm390. View

Guo J, Xu D, Kim D, Xu Y . Improving the performance of DomainParser for structural domain partition using neural network. Nucleic Acids Res. 2003; 31(3):944-52. PMC: 149209. DOI: 10.1093/nar/gkg189. View

Hong S, Joo K, Lee J . ConDo: protein domain boundary prediction using coevolutionary information. Bioinformatics. 2018; 35(14):2411-2417. DOI: 10.1093/bioinformatics/bty973. View

Zhou H, Xue B, Zhou Y . DDOMAIN: Dividing structures into domains using a normalized domain-domain interaction profile. Protein Sci. 2007; 16(5):947-55. PMC: 2206635. DOI: 10.1110/ps.062597307. View

10.

Alexandrov N, Shindyalov I . PDP: protein domain parser. Bioinformatics. 2003; 19(3):429-30. DOI: 10.1093/bioinformatics/btg006. View

11.

Li Y, Hu J, Zhang C, Yu D, Zhang Y . ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics. 2019; 35(22):4647-4655. PMC: 6853658. DOI: 10.1093/bioinformatics/btz291. View

12.

Fox N, Brenner S, Chandonia J . SCOPe: Structural Classification of Proteins--extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res. 2013; 42(Database issue):D304-9. PMC: 3965108. DOI: 10.1093/nar/gkt1240. View

13.

Wu S, Zhang Y . LOMETS: a local meta-threading-server for protein structure prediction. Nucleic Acids Res. 2007; 35(10):3375-82. PMC: 1904280. DOI: 10.1093/nar/gkm251. View

14.

Xue Z, Xu D, Wang Y, Zhang Y . ThreaDom: extracting protein domain boundary information from multiple threading alignments. Bioinformatics. 2013; 29(13):i247-56. PMC: 3694664. DOI: 10.1093/bioinformatics/btt209. View

15.

Li Y, Zhang C, Bell E, Yu D, Zhang Y . Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13. Proteins. 2019; 87(12):1082-1091. PMC: 6851483. DOI: 10.1002/prot.25798. View

16.

Wu S, Zhang Y . MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins. 2008; 72(2):547-56. PMC: 2666101. DOI: 10.1002/prot.21945. View

17.

Postic G, Ghouzam Y, Chebrek R, Gelly J . An ambiguity principle for assigning protein structural domains. Sci Adv. 2017; 3(1):e1600552. PMC: 5235333. DOI: 10.1126/sciadv.1600552. View

18.

Chandonia J, Fox N, Brenner S . SCOPe: classification of large macromolecular structures in the structural classification of proteins-extended database. Nucleic Acids Res. 2018; 47(D1):D475-D481. PMC: 6323910. DOI: 10.1093/nar/gky1134. View

19.

George R, Heringa J . SnapDRAGON: a method to delineate protein structural domains from sequence data. J Mol Biol. 2002; 316(3):839-51. DOI: 10.1006/jmbi.2001.5387. View

20.

Wang Y, Wang J, Li R, Shi Q, Xue Z, Zhang Y . ThreaDomEx: a unified platform for predicting continuous and discontinuous protein domains by multiple-threading and segment assembly. Nucleic Acids Res. 2017; 45(W1):W400-W407. PMC: 5793814. DOI: 10.1093/nar/gkx410. View