» Articles » PMID: 17573465

Proteome-wide Prediction of Novel DNA/RNA-binding Proteins Using Amino Acid Composition and Periodicity in the Hyperthermophilic Archaeon Pyrococcus Furiosus

Overview
Journal DNA Res
Date 2007 Jun 19
PMID 17573465
Citations 6
Authors
Affiliations
Soon will be listed here.
Abstract

Proteins play a critical role in complex biological systems, yet about half of the proteins in publicly available databases are annotated as functionally unknown. Proteome-wide functional classification using bioinformatics approaches thus is becoming an important method for revealing unknown protein functions. Using the hyperthermophilic archaeon Pyrococcus furiosus as a model species, we used the support vector machine (SVM) method to discriminate DNA/RNA-binding proteins from proteins with other functions, using amino acid composition and periodicities as feature vectors. We defined this value as the composition score (CO) and periodicity score (PD). The P. furiosus proteins were classified into three classes (I-III) on the basis of the two-dimensional correlation analysis of CO score and PD score. As a result, approximately 87% of the functionally known proteins categorized as class I proteins (CO score + PD score > 0.6) were found to be DNA/RNA-binding proteins. Applying the two-dimensional correlation analysis to the 994 hypothetical proteins in P. furiosus, a total of 151 proteins were predicted to be novel DNA/RNA-binding protein candidates. DNA/RNA-binding activities of randomly chosen hypothetical proteins were experimentally verified. Six out of seven candidate proteins in class I possessed DNA/RNA-binding activities, supporting the efficacy of our method.

Citing Articles

Prediction of RNA binding proteins comes of age from low resolution to high resolution.

Zhao H, Yang Y, Zhou Y Mol Biosyst. 2013; 9(10):2417-25.

PMID: 23872922 PMC: 3870025. DOI: 10.1039/c3mb70167k.


From face to interface recognition: a differential geometric approach to distinguish DNA from RNA binding surfaces.

Shazman S, Elber G, Mandel-Gutfreund Y Nucleic Acids Res. 2011; 39(17):7390-9.

PMID: 21693557 PMC: 3177183. DOI: 10.1093/nar/gkr395.


Boosting the prediction and understanding of DNA-binding domains from sequence.

Langlois R, Lu H Nucleic Acids Res. 2010; 38(10):3149-58.

PMID: 20156993 PMC: 2879530. DOI: 10.1093/nar/gkq061.


Identification of protein functions using a machine-learning approach based on sequence-derived properties.

Lee B, Shin M, Oh Y, Oh H, Ryu K Proteome Sci. 2009; 7:27.

PMID: 19664241 PMC: 2731080. DOI: 10.1186/1477-5956-7-27.


Characterization of a heat-stable enzyme possessing GTP-dependent RNA ligase activity from a hyperthermophilic archaeon, Pyrococcus furiosus.

Kanai A, Sato A, Fukuda Y, Okada K, Matsuda T, Sakamoto T RNA. 2009; 15(3):420-31.

PMID: 19155324 PMC: 2657004. DOI: 10.1261/rna.1122109.


References
1.
Han L, Cai C, Lo S, Chung M, Chen Y . Prediction of RNA-binding proteins from primary sequence by a support vector machine approach. RNA. 2004; 10(3):355-68. PMC: 1370931. DOI: 10.1261/rna.5890304. View

2.
Gatherer D, McEwan N . Analysis of sequence periodicity in E. coli proteins: empirical investigation of the "duplication and divergence" theory of protein evolution. J Mol Evol. 2003; 57(2):149-58. DOI: 10.1007/s00239-002-2462-1. View

3.
Siew N, Fischer D . Structural biology sheds light on the puzzle of genomic ORFans. J Mol Biol. 2004; 342(2):369-73. DOI: 10.1016/j.jmb.2004.06.073. View

4.
Shanahan H, Garcia M, Jones S, Thornton J . Identifying DNA-binding proteins using structural motifs and the electrostatic potential. Nucleic Acids Res. 2004; 32(16):4732-41. PMC: 519102. DOI: 10.1093/nar/gkh803. View

5.
Pazos F, Sternberg M . Automated prediction of protein function and detection of functional sites from structure. Proc Natl Acad Sci U S A. 2004; 101(41):14754-9. PMC: 522026. DOI: 10.1073/pnas.0404569101. View