» Articles » PMID: 30071697

A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers

Overview
Journal Genes (Basel)
Publisher MDPI
Date 2018 Aug 4
PMID 30071697
Citations 11
Authors
Affiliations
Soon will be listed here.
Abstract

Nowadays, various machine learning-based approaches using sequence information alone have been proposed for identifying DNA-binding proteins, which are crucial to many cellular processes, such as DNA replication, DNA repair and DNA modification. Among these methods, building a meaningful feature representation of the sequences and choosing an appropriate classifier are the most trivial tasks. Disclosing the significances and contributions of different feature spaces and classifiers to the final prediction is of the utmost importance, not only for the prediction performances, but also the practical clues of biological experiment designs. In this study, we propose a model stacking framework by orchestrating multi-view features and classifiers (MSFBinder) to investigate how to integrate and evaluate loosely-coupled models for predicting DNA-binding proteins. The framework integrates multi-view features including Local_DPP, 188D, Position-Specific Scoring Matrix (PSSM)_DWT and autocross-covariance of secondary structures(AC_Struc), which were extracted based on evolutionary information, sequence composition, physiochemical properties and predicted structural information, respectively. These features are fed into various loosely-coupled classifiers such as SVM and random forest. Then, a logistic regression model was applied to evaluate the contributions of these individual classifiers and to make the final prediction. When performing on the training dataset PDB1075, the proposed method achieves an accuracy of 83.53%. On the independent dataset PDB186, the method achieves an accuracy of 81.72%, which outperforms many existing methods. These results suggest that the framework is able to orchestrate various predicted models flexibly with good performances.

Citing Articles

SNARER: new molecular descriptors for SNARE proteins classification.

Auriemma Citarella A, Di Biasi L, Risi M, Tortora G BMC Bioinformatics. 2022; 23(1):148.

PMID: 35462533 PMC: 9035248. DOI: 10.1186/s12859-022-04677-z.


The Characterization of Structure and Prediction for Aquaporin in Tumour Progression by Machine Learning.

Chen Z, Jiao S, Zhao D, Zou Q, Xu L, Zhang L Front Cell Dev Biol. 2022; 10:845622.

PMID: 35178393 PMC: 8844512. DOI: 10.3389/fcell.2022.845622.


Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm.

Zhao Z, Yang W, Zhai Y, Liang Y, Zhao Y Front Genet. 2022; 12:821996.

PMID: 35154264 PMC: 8837382. DOI: 10.3389/fgene.2021.821996.


A sequence-based multiple kernel model for identifying DNA-binding proteins.

Qian Y, Jiang L, Ding Y, Tang J, Guo F BMC Bioinformatics. 2021; 22(Suppl 3):291.

PMID: 34058979 PMC: 8167993. DOI: 10.1186/s12859-020-03875-x.


Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning.

Li G, Du X, Li X, Zou L, Zhang G, Wu Z PeerJ. 2021; 9:e11262.

PMID: 33986992 PMC: 8101451. DOI: 10.7717/peerj.11262.


References
1.
Ma X, Guo J, Sun X . DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues. PLoS One. 2016; 11(12):e0167345. PMC: 5132331. DOI: 10.1371/journal.pone.0167345. View

2.
Paliwal K, Sharma A, Lyons J, Dehzangi A . Improving protein fold recognition using the amalgamation of evolutionary-based and structural based information. BMC Bioinformatics. 2014; 15 Suppl 16:S12. PMC: 4290640. DOI: 10.1186/1471-2105-15-S16-S12. View

3.
Jaiswal R, Singh S, Bastia D, Escalante C . Crystallization and preliminary X-ray characterization of the eukaryotic replication terminator Reb1-Ter DNA complex. Acta Crystallogr F Struct Biol Commun. 2015; 71(Pt 4):414-8. PMC: 4388176. DOI: 10.1107/S2053230X15004112. View

4.
Dehzangi A, Paliwal K, Lyons J, Sharma A, Sattar A . Proposing a highly accurate protein structural class predictor using segmentation-based features. BMC Genomics. 2014; 15 Suppl 1:S2. PMC: 4046757. DOI: 10.1186/1471-2164-15-S1-S2. View

5.
Nanni L, Brahnam S, Lumini A . Wavelet images and Chou's pseudo amino acid composition for protein classification. Amino Acids. 2011; 43(2):657-65. DOI: 10.1007/s00726-011-1114-9. View