IBPred: A Sequence-based Predictor for Identifying Ion Binding Protein in Phage

Overview

Journal Comput Struct Biotechnol J

Specialty Biotechnology

Date 2022 Sep 23

PMID 36147670

Authors

Shi-Shi Yuan

Dong Gao

Xue-Qin Xie

Cai-Yi Ma

Wei Su

Zhao-Yue Zhang

Yan Zheng

Hui Ding

Affiliations

Soon will be listed here.

Abstract

Ion binding proteins (IBPs) can selectively and non-covalently interact with ions. IBPs in phages also play an important role in biological processes. Therefore, accurate identification of IBPs is necessary for understanding their biological functions and molecular mechanisms that involve binding to ions. Since molecular biology experimental methods are still labor-intensive and cost-ineffective in identifying IBPs, it is helpful to develop computational methods to identify IBPs quickly and efficiently. In this work, a random forest (RF)-based model was constructed to quickly identify IBPs. Based on the protein sequence information and residues' physicochemical properties, the dipeptide composition combined with the physicochemical correlation between two residues were proposed for the extraction of features. A feature selection technique called analysis of variance (ANOVA) was used to exclude redundant information. By comparing with other classified methods, we demonstrated that our method could identify IBPs accurately. Based on the model, a Python package named IBPred was built with the source code which can be accessed at https://github.com/ShishiYuan/IBPred.

Citing Articles

TCellPredX: A Novel Approach for Accurate Prediction of Hepatitis C Virus Linear T Cell Epitopes.

Ge F, Li H, Zhang M, Arif M, Alam T ACS Omega. 2025; 9(52):51494-51507.

PMID: 39758636 PMC: 11696426. DOI: 10.1021/acsomega.4c08715.

ac4C-AFL: A high-precision identification of human mRNA N4-acetylcytidine sites based on adaptive feature representation learning.

Pham N, Terrance A, Jeon Y, Rakkiyappan R, Manavalan B Mol Ther Nucleic Acids. 2024; 35(2):102192.

PMID: 38779332 PMC: 11108997. DOI: 10.1016/j.omtn.2024.102192.

Accurately identifying hemagglutinin using sequence information and machine learning methods.

Zou X, Ren L, Cai P, Zhang Y, Ding H, Deng K Front Med (Lausanne). 2023; 10:1281880.

PMID: 38020152 PMC: 10644030. DOI: 10.3389/fmed.2023.1281880.

A First Computational Frame for Recognizing Heparin-Binding Protein.

Zhu W, Yuan S, Li J, Huang C, Lin H, Liao B Diagnostics (Basel). 2023; 13(14).

PMID: 37510209 PMC: 10377868. DOI: 10.3390/diagnostics13142465.

Antimicrobial Peptides Prediction method based on sequence multidimensional feature embedding.

Dong B, Li M, Jiang B, Gao B, Li D, Zhang T Front Genet. 2022; 13:1069558.

PMID: 36468005 PMC: 9714691. DOI: 10.3389/fgene.2022.1069558.

References

Fu L, Niu B, Zhu Z, Wu S, Li W . CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012; 28(23):3150-2. PMC: 3516142. DOI: 10.1093/bioinformatics/bts565. View

Muller-Xing R, Ardiansyah R, Xing Q, Faivre L, Tian J, Wang G . Polycomb proteins control floral determinacy by H3K27me3-mediated repression of pluripotency genes in Arabidopsis thaliana. J Exp Bot. 2022; 73(8):2385-2402. DOI: 10.1093/jxb/erac013. View

Li F, Chen J, Ge Z, Wen Y, Yue Y, Hayashida M . Computational prediction and interpretation of both general and specific types of promoters in Escherichia coli by exploiting a stacked ensemble-learning framework. Brief Bioinform. 2020; 22(2):2126-2140. PMC: 7986616. DOI: 10.1093/bib/bbaa049. View

Zheng L, Huang S, Mu N, Zhang H, Zhang J, Chang Y . RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou's five-step rule. Database (Oxford). 2019; 2019. PMC: 6893003. DOI: 10.1093/database/baz131. View

Han Y, Yang H, Huang Q, Sun Z, Li M, Zhang J . Risk prediction of diabetes and pre-diabetes based on physical examination data. Math Biosci Eng. 2022; 19(4):3597-3608. DOI: 10.3934/mbe.2022166. View

Xu Z, Luo M, Lin W, Xue G, Wang P, Jin X . DLpTCR: an ensemble deep learning framework for predicting immunogenic peptide recognized by T cell receptor. Brief Bioinform. 2021; 22(6). DOI: 10.1093/bib/bbab335. View

. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 2018; 47(D1):D330-D338. PMC: 6323945. DOI: 10.1093/nar/gky1055. View

Azam A, Tanji Y . Bacteriophage-host arm race: an update on the mechanism of phage resistance in bacteria and revenge of the phage with the perspective for phage therapy. Appl Microbiol Biotechnol. 2019; 103(5):2121-2131. DOI: 10.1007/s00253-019-09629-x. View

Zuo Y, Li Y, Chen Y, Li G, Yan Z, Yang L . PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition. Bioinformatics. 2016; 33(1):122-124. DOI: 10.1093/bioinformatics/btw564. View

10.

Mullick B, Magar R, Jhunjhunwala A, Barati Farimani A . Understanding mutation hotspots for the SARS-CoV-2 spike protein using Shannon Entropy and K-means clustering. Comput Biol Med. 2021; 138:104915. PMC: 8492016. DOI: 10.1016/j.compbiomed.2021.104915. View

11.

Huang Y, Zhou D, Wang Y, Zhang X, Su M, Wang C . Prediction of transcription factors binding events based on epigenetic modifications in different human cells. Epigenomics. 2020; 12(16):1443-1456. DOI: 10.2217/epi-2019-0321. View

12.

Wei L, He W, Malik A, Su R, Cui L, Manavalan B . Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework. Brief Bioinform. 2020; 22(4). DOI: 10.1093/bib/bbaa275. View

13.

Liu Q, Wan J, Wang G . A survey on computational methods in discovering protein inhibitors of SARS-CoV-2. Brief Bioinform. 2021; 23(1). PMC: 8524468. DOI: 10.1093/bib/bbab416. View

14.

Zhang X, Studier F . Multiple roles of T7 RNA polymerase and T7 lysozyme during bacteriophage T7 infection. J Mol Biol. 2004; 340(4):707-30. DOI: 10.1016/j.jmb.2004.05.006. View

15.

Wu H, Zhang P, Ai Z, Wei L, Zhang H, Yang F . StackTADB: a stacking-based ensemble learning model for predicting the boundaries of topologically associating domains (TADs) accurately in fruit flies. Brief Bioinform. 2022; 23(2). DOI: 10.1093/bib/bbac023. View

16.

Zhang Q, Li H, Liu Y, Li J, Wu C, Tang H . Exosomal Non-Coding RNAs: New Insights into the Biology of Hepatocellular Carcinoma. Curr Oncol. 2022; 29(8):5383-5406. PMC: 9406833. DOI: 10.3390/curroncol29080427. View

17.

Usman S, Khalid S, Bashir S . A deep learning based ensemble learning method for epileptic seizure prediction. Comput Biol Med. 2021; 136:104710. DOI: 10.1016/j.compbiomed.2021.104710. View

18.

Ao C, Zou Q, Yu L . NmRF: identification of multispecies RNA 2'-O-methylation modification sites from RNA sequences. Brief Bioinform. 2021; 23(1). DOI: 10.1093/bib/bbab480. View

19.

Zhang L, Xiao X, Xu Z . iPromoter-5mC: A Novel Fusion Decision Predictor for the Identification of 5-Methylcytosine Sites in Genome-Wide DNA Promoters. Front Cell Dev Biol. 2020; 8:614. PMC: 7399635. DOI: 10.3389/fcell.2020.00614. View

20.

Tang H, Zhao Y, Zou P, Zhang C, Chen R, Huang P . HBPred: a tool to identify growth hormone-binding proteins. Int J Biol Sci. 2018; 14(8):957-964. PMC: 6036759. DOI: 10.7150/ijbs.24174. View