» Articles » PMID: 20156993

Boosting the Prediction and Understanding of DNA-binding Domains from Sequence

Overview
Specialty Biochemistry
Date 2010 Feb 17
PMID 20156993
Citations 21
Authors
Affiliations
Soon will be listed here.
Abstract

DNA-binding proteins perform vital functions related to transcription, repair and replication. We have developed a new sequence-based machine learning protocol to identify DNA-binding proteins. We compare our method with an extensive benchmark of previously published structure-based machine learning methods as well as a standard sequence alignment technique, BLAST. Furthermore, we elucidate important feature interactions found in a learned model and analyze how specific rules capture general mechanisms that extend across DNA-binding motifs. This analysis is carried out using the malibu machine learning workbench available at http://proteomics.bioengr.uic.edu/malibu and the corresponding data sets and features are available at http://proteomics.bioengr.uic.edu/dna.

Citing Articles

Benchmarking recent computational tools for DNA-binding protein identification.

Luo X, Chi A, Lin A, Ong T, Wong L, Rahman C Brief Bioinform. 2024; 26(1).

PMID: 39657630 PMC: 11630855. DOI: 10.1093/bib/bbae634.


ProkDBP: Toward more precise identification of prokaryotic DNA binding proteins.

Pradhan U, Meher P, Naha S, Das R, Gupta A, Parsad R Protein Sci. 2024; 33(6):e5015.

PMID: 38747369 PMC: 11094783. DOI: 10.1002/pro.5015.


DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features.

Barukab O, Khan Y, Khan S, Chou K Appl Bionics Biomech. 2022; 2022:5483115.

PMID: 35465187 PMC: 9020926. DOI: 10.1155/2022/5483115.


PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method.

Wang J, Zheng H, Yang Y, Xiao W, Liu T Biomed Res Int. 2020; 2020:7297631.

PMID: 32352006 PMC: 7174956. DOI: 10.1155/2020/7297631.


HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection.

Sang X, Xiao W, Zheng H, Yang Y, Liu T Comput Math Methods Med. 2020; 2020:1384749.

PMID: 32300371 PMC: 7142336. DOI: 10.1155/2020/1384749.


References
1.
Stormo G . DNA binding sites: representation and discovery. Bioinformatics. 2000; 16(1):16-23. DOI: 10.1093/bioinformatics/16.1.16. View

2.
Chou P, Fasman G . Prediction of the secondary structure of proteins from their amino acid sequence. Adv Enzymol Relat Areas Mol Biol. 1978; 47:45-148. DOI: 10.1002/9780470122921.ch2. View

3.
Cai Y, Lin S . Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. Biochim Biophys Acta. 2003; 1648(1-2):127-33. DOI: 10.1016/s1570-9639(03)00112-2. View

4.
Shanahan H, Garcia M, Jones S, Thornton J . Identifying DNA-binding proteins using structural motifs and the electrostatic potential. Nucleic Acids Res. 2004; 32(16):4732-41. PMC: 519102. DOI: 10.1093/nar/gkh803. View

5.
Luscombe N, Thornton J . Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity. J Mol Biol. 2002; 320(5):991-1009. DOI: 10.1016/s0022-2836(02)00571-5. View