» Articles » PMID: 26482832

DNA Binding Protein Identification by Combining Pseudo Amino Acid Composition and Profile-based Protein Representation

Overview
Journal Sci Rep
Specialty Science
Date 2015 Oct 21
PMID 26482832
Citations 42
Authors
Affiliations
Soon will be listed here.
Abstract

DNA-binding proteins play an important role in most cellular processes. Therefore, it is necessary to develop an efficient predictor for identifying DNA-binding proteins only based on the sequence information of proteins. The bottleneck for constructing a useful predictor is to find suitable features capturing the characteristics of DNA binding proteins. We applied PseAAC to DNA binding protein identification, and PseAAC was further improved by incorporating the evolutionary information by using profile-based protein representation. Finally, Combined with Support Vector Machines (SVMs), a predictor called iDNAPro-PseAAC was proposed. Experimental results on an updated benchmark dataset showed that iDNAPro-PseAAC outperformed some state-of-the-art approaches, and it can achieve stable performance on an independent dataset. By using an ensemble learning approach to incorporate more negative samples (non-DNA binding proteins) in the training process, the performance of iDNAPro-PseAAC was further improved. The web server of iDNAPro-PseAAC is available at http://bioinformatics.hitsz.edu.cn/iDNAPro-PseAAC/.

Citing Articles

Systematic discovery of DNA-binding tandem repeat proteins.

Hu X, Zhang X, Sun W, Liu C, Deng P, Cao Y Nucleic Acids Res. 2024; 52(17):10464-10489.

PMID: 39189466 PMC: 11417379. DOI: 10.1093/nar/gkae710.


LGC-DBP: the method of DNA-binding protein identification based on PSSM and deep learning.

Zhu Y, Sun A Front Genet. 2024; 15:1411847.

PMID: 38903752 PMC: 11188361. DOI: 10.3389/fgene.2024.1411847.


ProkDBP: Toward more precise identification of prokaryotic DNA binding proteins.

Pradhan U, Meher P, Naha S, Das R, Gupta A, Parsad R Protein Sci. 2024; 33(6):e5015.

PMID: 38747369 PMC: 11094783. DOI: 10.1002/pro.5015.


Protein feature engineering framework for AMPylation site prediction.

Prabhu H, Bhosale H, Sane A, Dhadwal R, Ramakrishnan V, Valadi J Sci Rep. 2024; 14(1):8695.

PMID: 38622194 PMC: 11369087. DOI: 10.1038/s41598-024-58450-8.


HormoNet: a deep learning approach for hormone-drug interaction prediction.

Emami N, Ferdousi R BMC Bioinformatics. 2024; 25(1):87.

PMID: 38418979 PMC: 10903040. DOI: 10.1186/s12859-024-05708-7.


References
1.
Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H . The Protein Data Bank. Nucleic Acids Res. 1999; 28(1):235-42. PMC: 102472. DOI: 10.1093/nar/28.1.235. View

2.
Chou K . Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins. 2001; 43(3):246-55. DOI: 10.1002/prot.1035. View

3.
Stawiski E, Gregoret L, Mandel-Gutfreund Y . Annotating nucleic acid-binding function based on protein structure. J Mol Biol. 2003; 326(4):1065-79. DOI: 10.1016/s0022-2836(03)00031-7. View

4.
Cai Y, Lin S . Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. Biochim Biophys Acta. 2003; 1648(1-2):127-33. DOI: 10.1016/s1570-9639(03)00112-2. View

5.
Wang G, Dunbrack Jr R . PISCES: recent improvements to a PDB sequence culling server. Nucleic Acids Res. 2005; 33(Web Server issue):W94-8. PMC: 1160163. DOI: 10.1093/nar/gki402. View