FUpred: Detecting Protein Domains Through Deep-learning-based Contact Map Prediction
Overview
Affiliations
Motivation: Protein domains are subunits that can fold and function independently. Correct domain boundary assignment is thus a critical step toward accurate protein structure and function analyses. There is, however, no efficient algorithm available for accurate domain prediction from sequence. The problem is particularly challenging for proteins with discontinuous domains, which consist of domain segments that are separated along the sequence.
Results: We developed a new algorithm, FUpred, which predicts protein domain boundaries utilizing contact maps created by deep residual neural networks coupled with coevolutionary precision matrices. The core idea of the algorithm is to retrieve domain boundary locations by maximizing the number of intra-domain contacts, while minimizing the number of inter-domain contacts from the contact maps. FUpred was tested on a large-scale dataset consisting of 2549 proteins and generated correct single- and multi-domain classifications with a Matthew's correlation coefficient of 0.799, which was 19.1% (or 5.3%) higher than the best machine learning (or threading)-based method. For proteins with discontinuous domains, the domain boundary detection and normalized domain overlapping scores of FUpred were 0.788 and 0.521, respectively, which were 17.3% and 23.8% higher than the best control method. The results demonstrate a new avenue to accurately detect domain composition from sequence alone, especially for discontinuous, multi-domain proteins.
Availability And Implementation: https://zhanglab.ccmb.med.umich.edu/FUpred.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Zhao Y, Chen J, Tian Y, Huang H, Zhao F, Deng X Arch Microbiol. 2025; 207(2):36.
PMID: 39825920 DOI: 10.1007/s00203-024-04223-w.
Srivastava G, Liu M, Ni X, Pu L, Brylinski M Methods Mol Biol. 2024; 2867:79-104.
PMID: 39576576 DOI: 10.1007/978-1-0716-4196-5_5.
Hierarchical Analysis of Protein Structures: From Secondary Structures to Protein Units and Domains.
Perin C, Cretin G, Gelly J Methods Mol Biol. 2024; 2870:357-370.
PMID: 39543044 DOI: 10.1007/978-1-0716-4213-9_18.
Protein domain embeddings for fast and accurate similarity search.
Iovino B, Tang H, Ye Y Genome Res. 2024; 34(9):1434-1444.
PMID: 39237301 PMC: 11529836. DOI: 10.1101/gr.279127.124.
Chainsaw: protein domain segmentation with fully convolutional neural networks.
Wells J, Hawkins-Hooker A, Bordin N, Sillitoe I, Paige B, Orengo C Bioinformatics. 2024; 40(5).
PMID: 38718225 PMC: 11256964. DOI: 10.1093/bioinformatics/btae296.