» Articles » PMID: 35922751

DCSE:Double-Channel-Siamese-Ensemble Model for Protein Protein Interaction Prediction

Overview
Journal BMC Genomics
Publisher Biomed Central
Specialty Genetics
Date 2022 Aug 3
PMID 35922751
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Protein-protein interaction (PPI) is very important for many biochemical processes. Therefore, accurate prediction of PPI can help us better understand the role of proteins in biochemical processes. Although there are many methods to predict PPI in biology, they are time-consuming and lack accuracy, so it is necessary to build an efficiently and accurately computational model in the field of PPI prediction.

Results: We present a novel sequence-based computational approach called DCSE (Double-Channel-Siamese-Ensemble) to predict potential PPI. In the encoding layer, we treat each amino acid as a word, and map it into an N-dimensional vector. In the feature extraction layer, we extract features from local and global perspectives by Multilayer Convolutional Neural Network (MCN) and Multilayer Bidirectional Gated Recurrent Unit with Convolutional Neural Networks (MBC). Finally, the output of the feature extraction layer is then fed into the prediction layer to output whether the input protein pair will interact each other. The MCN and MBC are siamese and ensemble based network, which can effectively improve the performance of the model. In order to demonstrate our model's performance, we compare it with four machine learning based and three deep learning based models. The results show that our method outperforms other models in all evaluation criteria. The Accuracy, Precision, [Formula: see text], Recall and MCC of our model are 0.9303, 0.9091, 0.9268, 0.9452, 0.8609. For the other seven models, the highest Accuracy, Precision, [Formula: see text], Recall and MCC are 0.9288, 0.9243, 0.9246, 0.9250, 0.8572. We also test our model in the imbalanced dataset and transfer our model to another species. The results show our model is excellent.

Conclusion: Our model achieves the best performance by comparing it with seven other models. NLP-based coding method has a good effect on PPI prediction task. MCN and MBC extract protein sequence features from local and global perspectives and these two feature extraction layers are based on siamese and ensemble network structures. Siamese-based network structure can keep the features consistent and ensemble based network structure can effectively improve the accuracy of the model.

Citing Articles

Prediction of Protein-Protein Interactions Based on Integrating Deep Learning and Feature Fusion.

Tran H, Nguyen P, Guo F, Wang J Int J Mol Sci. 2024; 25(11).

PMID: 38892007 PMC: 11172432. DOI: 10.3390/ijms25115820.


GNNGL-PPI: multi-category prediction of protein-protein interactions using graph neural networks based on global graphs and local subgraphs.

Zeng X, Meng F, Wen M, Li S, Li Y BMC Genomics. 2024; 25(1):406.

PMID: 38724906 PMC: 11080243. DOI: 10.1186/s12864-024-10299-x.


Deep learning in structural bioinformatics: current applications and future perspectives.

Kumar N, Srivastava R Brief Bioinform. 2024; 25(3).

PMID: 38701422 PMC: 11066934. DOI: 10.1093/bib/bbae042.


Recent Advances in Deep Learning for Protein-Protein Interaction Analysis: A Comprehensive Review.

Lee M Molecules. 2023; 28(13).

PMID: 37446831 PMC: 10343845. DOI: 10.3390/molecules28135169.


On the choice of negative examples for prediction of host-pathogen protein interactions.

Neumann D, Roy S, Minhas F, Ben-Hur A Front Bioinform. 2023; 2:1083292.

PMID: 36591335 PMC: 9798088. DOI: 10.3389/fbinf.2022.1083292.

References
1.
Fu L, Niu B, Zhu Z, Wu S, Li W . CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012; 28(23):3150-2. PMC: 3516142. DOI: 10.1093/bioinformatics/bts565. View

2.
Yang X, Yang S, Li Q, Wuchty S, Zhang Z . Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method. Comput Struct Biotechnol J. 2020; 18:153-161. PMC: 6961065. DOI: 10.1016/j.csbj.2019.12.005. View

3.
Yang K, Wu Z, Bedbrook C, Arnold F . Learned protein embeddings for machine learning. Bioinformatics. 2018; 34(15):2642-2648. PMC: 6061698. DOI: 10.1093/bioinformatics/bty178. View

4.
Wang S, Song T, Zhang S, Jiang M, Wei Z, Li Z . Molecular substructure tree generative model for de novo drug design. Brief Bioinform. 2022; 23(2). DOI: 10.1093/bib/bbab592. View

5.
Pei F, Shi Q, Zhang H, Bahar I . Predicting Protein-Protein Interactions Using Symmetric Logistic Matrix Factorization. J Chem Inf Model. 2021; 61(4):1670-1682. PMC: 8253547. DOI: 10.1021/acs.jcim.1c00173. View