» Articles » PMID: 38580921

DPI_CDF: Druggable Protein Identifier Using Cascade Deep Forest

Overview
Publisher Biomed Central
Specialty Biology
Date 2024 Apr 5
PMID 38580921
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor's performance is still not satisfactory.

Methods: In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only. DPI_CDF utilizes evolutionary-based (i.e., histograms of oriented gradients for position-specific scoring matrix), physiochemical-based (i.e., component protein sequence representation), and compositional-based (i.e., normalized qualitative characteristic) properties of protein sequence to generate features. Then a hierarchical deep forest model fuses these three encoding schemes to build the proposed model DPI_CDF.

Results: The empirical outcomes on 10-fold cross-validation demonstrate that the proposed model achieved 99.13 % accuracy and 0.982 of Matthew's-correlation-coefficient (MCC) on the training dataset. The generalization power of the trained model is further examined on an independent dataset and achieved 95.01% of maximum accuracy and 0.900 MCC. When compared to current state-of-the-art methods, DPI_CDF improves in terms of accuracy by 4.27% and 4.31% on training and testing datasets, respectively. We believe, DPI_CDF will support the research community to identify druggable proteins and escalate the drug discovery process.

Availability: The benchmark datasets and source codes are available in GitHub: http://github.com/Muhammad-Arif-NUST/DPI_CDF .

Citing Articles

Advancing the Accuracy of Anti-MRSA Peptide Prediction Through Integrating Multi-Source Protein Language Models.

Shoombuatong W, Mookdarsanit P, Mookdarsanit L, Schaduangrat N, Ahmed S, Kabir M Interdiscip Sci. 2025; .

PMID: 40067411 DOI: 10.1007/s12539-025-00696-5.


Advancing the accuracy of tyrosinase inhibitory peptides prediction via a multiview feature fusion strategy.

Shoombuatong W, Schaduangrat N, Homdee N, Ahmed S, Chumnanpuen P Sci Rep. 2025; 15(1):4762.

PMID: 39922825 PMC: 11807091. DOI: 10.1038/s41598-024-81807-y.


StackAHTPs: An explainable antihypertensive peptides identifier based on heterogeneous features and stacked learning approach.

Ghulam A, Arif M, Unar A, A Thafar M, Albaradei S, Worachartcheewan A IET Syst Biol. 2025; 19(1):e70002.

PMID: 39905861 PMC: 11794993. DOI: 10.1049/syb2.70002.


TargetCLP: clathrin proteins prediction combining transformed and evolutionary scale modeling-based multi-view features via weighted feature integration approach.

Ullah M, Akbar S, Raza A, Khan K, Zou Q Brief Bioinform. 2025; 26(1.

PMID: 39844339 PMC: 11753890. DOI: 10.1093/bib/bbaf026.


Deep-m5U: a deep learning-based approach for RNA 5-methyluridine modification prediction using optimized feature integration.

Noor S, Naseem A, Awan H, Aslam W, Khan S, AlQahtani S BMC Bioinformatics. 2024; 25(1):360.

PMID: 39563239 PMC: 11577875. DOI: 10.1186/s12859-024-05978-1.

References
1.
Khan A, Uddin J, Ali F, Kumar H, Alghamdi W, Ahmad A . AFP-SPTS: An Accurate Prediction of Antifreeze Proteins Using Sequential and Pseudo-Tri-Slicing Evolutionary Features with an Extremely Randomized Tree. J Chem Inf Model. 2023; 63(3):826-834. DOI: 10.1021/acs.jcim.2c01417. View

2.
Yu L, Xue L, Liu F, Li Y, Jing R, Luo J . The applications of deep learning algorithms on in silico druggable proteins identification. J Adv Res. 2022; 41:219-231. PMC: 9637576. DOI: 10.1016/j.jare.2022.01.009. View

3.
Qiu W, Sun B, Xiao X, Xu Z, Jia J, Chou K . iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. Genomics. 2017; 110(5):239-246. DOI: 10.1016/j.ygeno.2017.10.008. View

4.
Chen Z, Li L, He Z, Zhou J, Li Y, Wong L . An Improved Deep Forest Model for Predicting Self-Interacting Proteins From Protein Sequence Using Wavelet Transformation. Front Genet. 2019; 10:90. PMC: 6405691. DOI: 10.3389/fgene.2019.00090. View

5.
Zhou Z, Feng J . Deep forest. Natl Sci Rev. 2021; 6(1):74-86. PMC: 8291612. DOI: 10.1093/nsr/nwy108. View