DNA Binding Protein Identification by Combining Pseudo Amino Acid Composition and Profile-based Protein Representation

Overview

Journal Sci Rep

Specialty Science

Date 2015 Oct 21

PMID 26482832

Citations 42

Authors

Bin Liu

Shanyi Wang

Xiaolong Wang

Affiliations

Soon will be listed here.

Abstract

DNA-binding proteins play an important role in most cellular processes. Therefore, it is necessary to develop an efficient predictor for identifying DNA-binding proteins only based on the sequence information of proteins. The bottleneck for constructing a useful predictor is to find suitable features capturing the characteristics of DNA binding proteins. We applied PseAAC to DNA binding protein identification, and PseAAC was further improved by incorporating the evolutionary information by using profile-based protein representation. Finally, Combined with Support Vector Machines (SVMs), a predictor called iDNAPro-PseAAC was proposed. Experimental results on an updated benchmark dataset showed that iDNAPro-PseAAC outperformed some state-of-the-art approaches, and it can achieve stable performance on an independent dataset. By using an ensemble learning approach to incorporate more negative samples (non-DNA binding proteins) in the training process, the performance of iDNAPro-PseAAC was further improved. The web server of iDNAPro-PseAAC is available at http://bioinformatics.hitsz.edu.cn/iDNAPro-PseAAC/.

Citing Articles

Systematic discovery of DNA-binding tandem repeat proteins.

Hu X, Zhang X, Sun W, Liu C, Deng P, Cao Y Nucleic Acids Res. 2024; 52(17):10464-10489.

PMID: 39189466 PMC: 11417379. DOI: 10.1093/nar/gkae710.

LGC-DBP: the method of DNA-binding protein identification based on PSSM and deep learning.

Zhu Y, Sun A Front Genet. 2024; 15:1411847.

PMID: 38903752 PMC: 11188361. DOI: 10.3389/fgene.2024.1411847.

ProkDBP: Toward more precise identification of prokaryotic DNA binding proteins.

Pradhan U, Meher P, Naha S, Das R, Gupta A, Parsad R Protein Sci. 2024; 33(6):e5015.

PMID: 38747369 PMC: 11094783. DOI: 10.1002/pro.5015.

Protein feature engineering framework for AMPylation site prediction.

Prabhu H, Bhosale H, Sane A, Dhadwal R, Ramakrishnan V, Valadi J Sci Rep. 2024; 14(1):8695.

PMID: 38622194 PMC: 11369087. DOI: 10.1038/s41598-024-58450-8.

HormoNet: a deep learning approach for hormone-drug interaction prediction.

Emami N, Ferdousi R BMC Bioinformatics. 2024; 25(1):87.

PMID: 38418979 PMC: 10903040. DOI: 10.1186/s12859-024-05708-7.

References

Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H . The Protein Data Bank. Nucleic Acids Res. 1999; 28(1):235-42. PMC: 102472. DOI: 10.1093/nar/28.1.235. View

Chou K . Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins. 2001; 43(3):246-55. DOI: 10.1002/prot.1035. View

Stawiski E, Gregoret L, Mandel-Gutfreund Y . Annotating nucleic acid-binding function based on protein structure. J Mol Biol. 2003; 326(4):1065-79. DOI: 10.1016/s0022-2836(03)00031-7. View

Cai Y, Lin S . Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. Biochim Biophys Acta. 2003; 1648(1-2):127-33. DOI: 10.1016/s1570-9639(03)00112-2. View

Wang G, Dunbrack Jr R . PISCES: recent improvements to a PDB sequence culling server. Nucleic Acids Res. 2005; 33(Web Server issue):W94-8. PMC: 1160163. DOI: 10.1093/nar/gki402. View

Szilagyi A, Skolnick J . Efficient prediction of nucleic acid binding function from low-resolution protein structures. J Mol Biol. 2006; 358(3):922-33. DOI: 10.1016/j.jmb.2006.02.053. View

Wang L, Brown S . BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res. 2006; 34(Web Server issue):W243-8. PMC: 1538853. DOI: 10.1093/nar/gkl298. View

Hwang S, Gou Z, Kuznetsov I . DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics. 2007; 23(5):634-6. DOI: 10.1093/bioinformatics/btl672. View

Ofran Y, Mysore V, Rost B . Prediction of DNA-binding residues from sequence. Bioinformatics. 2007; 23(13):i347-53. DOI: 10.1093/bioinformatics/btm174. View

10.

Kumar M, Gromiha M, Raghava G . Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinformatics. 2007; 8:463. PMC: 2216048. DOI: 10.1186/1471-2105-8-463. View

11.

Gao M, Skolnick J . DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions. Nucleic Acids Res. 2008; 36(12):3978-92. PMC: 2475642. DOI: 10.1093/nar/gkn332. View

12.

Bowie J, Luthy R, Eisenberg D . A method to identify protein sequences that fold into a known three-dimensional structure. Science. 1991; 253(5016):164-70. DOI: 10.1126/science.1853201. View

13.

Wu J, Liu H, Duan X, Ding Y, Wu H, Bai Y . Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature. Bioinformatics. 2008; 25(1):30-5. PMC: 2638931. DOI: 10.1093/bioinformatics/btn583. View

14.

Kumar K, Pugalenthi G, Suganthan P . DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest. J Biomol Struct Dyn. 2009; 26(6):679-86. DOI: 10.1080/07391102.2009.10507281. View

15.

Gao M, Skolnick J . A threading-based method for the prediction of DNA-binding proteins with application to the human genome. PLoS Comput Biol. 2009; 5(11):e1000567. PMC: 2770119. DOI: 10.1371/journal.pcbi.1000567. View

16.

Kern S, Kinzler K, Bruskin A, Jarosz D, Friedman P, Prives C . Identification of p53 as a sequence-specific DNA-binding protein. Science. 1991; 252(5013):1708-11. DOI: 10.1126/science.2047879. View

17.

Helwa R, Hoheisel J . Analysis of DNA-protein interactions: from nitrocellulose filter binding assays to microarray studies. Anal Bioanal Chem. 2010; 398(6):2551-61. DOI: 10.1007/s00216-010-4096-7. View

18.

Lin W, Fang J, Xiao X, Chou K . iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One. 2011; 6(9):e24756. PMC: 3174210. DOI: 10.1371/journal.pone.0024756. View

19.

Szaboova A, Kuzelka O, Zelezny F, Tolar J . Prediction of DNA-binding propensity of proteins by the ball-histogram method using automatic template search. BMC Bioinformatics. 2012; 13 Suppl 10:S3. PMC: 3382442. DOI: 10.1186/1471-2105-13-S10-S3. View

20.

Lin C, Zou Y, Qin J, Liu X, Jiang Y, Ke C . Hierarchical classification of protein folds using a novel ensemble classifier. PLoS One. 2013; 8(2):e56499. PMC: 3577917. DOI: 10.1371/journal.pone.0056499. View