Improving Protein Fold Recognition Using Triplet Network and Ensemble Deep Learning

Overview

Journal Brief Bioinform

Publisher Oxford University Press

Specialty Biology

Date 2021 Jul 6

PMID 34226918

Citations 2

Authors

Yan Liu

Ke Han

Yi-Heng Zhu

Ying Zhang

Long-Chen Shen

Jiangning Song

Dong-Jun Yu

Affiliations

Soon will be listed here.

Abstract

Protein fold recognition is a critical step toward protein structure and function prediction, aiming at providing the most likely fold type of the query protein. In recent years, the development of deep learning (DL) technique has led to massive advances in this important field, and accordingly, the sensitivity of protein fold recognition has been dramatically improved. Most DL-based methods take an intermediate bottleneck layer as the feature representation of proteins with new fold types. However, this strategy is indirect, inefficient and conditional on the hypothesis that the bottleneck layer's representation is assumed as a good representation of proteins with new fold types. To address the above problem, in this work, we develop a new computational framework by combining triplet network and ensemble DL. We first train a DL-based model, termed FoldNet, which employs triplet loss to train the deep convolutional network. FoldNet directly optimizes the protein fold embedding itself, making the proteins with the same fold types be closer to each other than those with different fold types in the new protein embedding space. Subsequently, using the trained FoldNet, we implement a new residue-residue contact-assisted predictor, termed FoldTR, which improves protein fold recognition. Furthermore, we propose a new ensemble DL method, termed FSD_XGBoost, which combines protein fold embedding with the other two discriminative fold-specific features extracted by two DL-based methods SSAfold and DeepFR. The Top 1 sensitivity of FSD_XGBoost increases to 74.8% at the fold level, which is ~9% higher than that of the state-of-the-art method. Together, the results suggest that fold-specific features extracted by different DL methods complement with each other, and their combination can further improve fold recognition at the fold level. The implemented web server of FoldTR and benchmark datasets are publicly available at http://csbio.njust.edu.cn/bioinf/foldtr/.

Citing Articles

TripletCell: a deep metric learning framework for accurate annotation of cell types at the single-cell level.

Liu Y, Wei G, Li C, Shen L, Gasser R, Song J Brief Bioinform. 2023; 24(3).

PMID: 37080771 PMC: 10199768. DOI: 10.1093/bib/bbad132.

MoRF-FUNCpred: Molecular Recognition Feature Function Prediction Based on Multi-Label Learning and Ensemble Learning.

Li H, Pang Y, Liu B, Yu L Front Pharmacol. 2022; 13:856417.

PMID: 35350759 PMC: 8957949. DOI: 10.3389/fphar.2022.856417.

References

Sheng N, Cui H, Zhang T, Xuan P . Attentional multi-level representation encoding based on convolutional and variance autoencoders for lncRNA-disease association prediction. Brief Bioinform. 2020; 22(3). DOI: 10.1093/bib/bbaa067. View

Liu S, Zhang C, Liang S, Zhou Y . Fold recognition by concurrent use of solvent accessibility and residue depth. Proteins. 2007; 68(3):636-45. DOI: 10.1002/prot.21459. View

Zhu J, Zhang H, Cheng Li S, Wang C, Kong L, Sun S . Improving protein fold recognition by extracting fold-specific features from predicted residue-residue contacts. Bioinformatics. 2017; 33(23):3749-3757. DOI: 10.1093/bioinformatics/btx514. View

Cheng J, Baldi P . A machine learning information retrieval approach to protein fold recognition. Bioinformatics. 2006; 22(12):1456-63. DOI: 10.1093/bioinformatics/btl102. View

Hochreiter S, Schmidhuber J . Long short-term memory. Neural Comput. 1997; 9(8):1735-80. DOI: 10.1162/neco.1997.9.8.1735. View

Tian C, Xu Y, Zuo W . Image denoising using deep CNN with batch renormalization. Neural Netw. 2019; 121:461-473. DOI: 10.1016/j.neunet.2019.08.022. View

Li Y, Hu J, Zhang C, Yu D, Zhang Y . ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics. 2019; 35(22):4647-4655. PMC: 6853658. DOI: 10.1093/bioinformatics/btz291. View

Bonomi M, Pellarin R, Vendruscolo M . Simultaneous Determination of Protein Structure and Dynamics Using Cryo-Electron Microscopy. Biophys J. 2018; 114(7):1604-1613. PMC: 5954442. DOI: 10.1016/j.bpj.2018.02.028. View

Remmert M, Biegert A, Hauser A, Soding J . HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods. 2011; 9(2):173-5. DOI: 10.1038/nmeth.1818. View

10.

Huang Y, Niu B, Gao Y, Fu L, Li W . CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010; 26(5):680-2. PMC: 2828112. DOI: 10.1093/bioinformatics/btq003. View

11.

Jones D, Taylor W, Thornton J . A new approach to protein fold recognition. Nature. 1992; 358(6381):86-9. DOI: 10.1038/358086a0. View

12.

Chen M, Li Y, Zhu Y, Ge F, Yu D . SSCpred: Single-Sequence-Based Protein Contact Prediction Using Deep Fully Convolutional Network. J Chem Inf Model. 2020; 60(6):3295-3303. DOI: 10.1021/acs.jcim.9b01207. View

13.

Hargbo J, Elofsson A . Hidden Markov models that use predicted secondary structures for fold recognition. Proteins. 1999; 36(1):68-76. View

14.

Jones D . Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 1999; 292(2):195-202. DOI: 10.1006/jmbi.1999.3091. View

15.

Jo T, Hou J, Eickholt J, Cheng J . Improving Protein Fold Recognition by Deep Learning Networks. Sci Rep. 2015; 5:17573. PMC: 4669437. DOI: 10.1038/srep17573. View

16.

Yan K, Fang X, Xu Y, Liu B . Protein fold recognition based on multi-view modeling. Bioinformatics. 2019; 35(17):2982-2990. DOI: 10.1093/bioinformatics/btz040. View

17.

Shi J, Blundell T, Mizuguchi K . FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol. 2001; 310(1):243-57. DOI: 10.1006/jmbi.2001.4762. View

18.

Soding J, Biegert A, Lupas A . The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 2005; 33(Web Server issue):W244-8. PMC: 1160169. DOI: 10.1093/nar/gki408. View

19.

Quang D, Xie X . DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 2016; 44(11):e107. PMC: 4914104. DOI: 10.1093/nar/gkw226. View

20.

Lindahl E, Elofsson A . Identification of related proteins on family, superfamily and fold level. J Mol Biol. 2000; 295(3):613-25. DOI: 10.1006/jmbi.1999.3377. View