» Articles » PMID: 26304539

DNA-binding Protein Prediction Using Plant Specific Support Vector Machines: Validation and Application of a New Genome Annotation Tool

Overview
Specialty Biochemistry
Date 2015 Aug 26
PMID 26304539
Citations 12
Authors
Affiliations
Soon will be listed here.
Abstract

There are currently 151 plants with draft genomes available but levels of functional annotation for putative protein products are low. Therefore, accurate computational predictions are essential to annotate genomes in the first instance, and to provide focus for the more costly and time consuming functional assays that follow. DNA-binding proteins are an important class of proteins that require annotation, but current computational methods are not applicable for genome wide predictions in plant species. Here, we explore the use of species and lineage specific models for the prediction of DNA-binding proteins in plants. We show that a species specific support vector machine model based on Arabidopsis sequence data is more accurate (accuracy 81%) than a generic model (74%), and based on this we develop a plant specific model for predicting DNA-binding proteins. We apply this model to the tomato proteome and demonstrate its ability to perform accurate high-throughput prediction of DNA-binding proteins. In doing so, we have annotated 36 currently uncharacterised proteins by assigning a putative DNA-binding function. Our model is publically available and we propose it be used in combination with existing tools to help increase annotation levels of DNA-binding proteins encoded in plant genomes.

Citing Articles

Accurate prediction of nucleic acid binding proteins using protein language model.

Wu S, Xu J, Guo J Bioinform Adv. 2025; 5(1):vbaf008.

PMID: 39990254 PMC: 11845279. DOI: 10.1093/bioadv/vbaf008.


Improved prediction of DNA and RNA binding proteins with deep learning models.

Wu S, Guo J Brief Bioinform. 2024; 25(4).

PMID: 38856168 PMC: 11163377. DOI: 10.1093/bib/bbae285.


ProkDBP: Toward more precise identification of prokaryotic DNA binding proteins.

Pradhan U, Meher P, Naha S, Das R, Gupta A, Parsad R Protein Sci. 2024; 33(6):e5015.

PMID: 38747369 PMC: 11094783. DOI: 10.1002/pro.5015.


RBProkCNN: Deep learning on appropriate contextual evolutionary information for RNA binding protein discovery in prokaryotes.

Pradhan U, Naha S, Das R, Gupta A, Parsad R, Meher P Comput Struct Biotechnol J. 2024; 23:1631-1640.

PMID: 38660008 PMC: 11039349. DOI: 10.1016/j.csbj.2024.04.034.


Single-Stranded DNA Binding Proteins and Their Identification Using Machine Learning-Based Approaches.

Guo J, Malik F Biomolecules. 2022; 12(9).

PMID: 36139026 PMC: 9496475. DOI: 10.3390/biom12091187.


References
1.
Lee D, Redfern O, Orengo C . Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol. 2007; 8(12):995-1005. DOI: 10.1038/nrm2281. View

2.
Murakami Y, Spriggs R, Nakamura H, Jones S . PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences. Nucleic Acids Res. 2010; 38(Web Server issue):W412-6. PMC: 2896099. DOI: 10.1093/nar/gkq474. View

3.
Du Z, Zhou X, Ling Y, Zhang Z, Su Z . agriGO: a GO analysis toolkit for the agricultural community. Nucleic Acids Res. 2010; 38(Web Server issue):W64-70. PMC: 2896167. DOI: 10.1093/nar/gkq310. View

4.
Magrane M . UniProt Knowledgebase: a hub of integrated protein data. Database (Oxford). 2011; 2011:bar009. PMC: 3070428. DOI: 10.1093/database/bar009. View

5.
Chen K, Rajewsky N . The evolution of gene regulation by transcription factors and microRNAs. Nat Rev Genet. 2007; 8(2):93-103. DOI: 10.1038/nrg1990. View