DPI_CDF: Druggable Protein Identifier Using Cascade Deep Forest
Overview
Affiliations
Background: Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor's performance is still not satisfactory.
Methods: In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only. DPI_CDF utilizes evolutionary-based (i.e., histograms of oriented gradients for position-specific scoring matrix), physiochemical-based (i.e., component protein sequence representation), and compositional-based (i.e., normalized qualitative characteristic) properties of protein sequence to generate features. Then a hierarchical deep forest model fuses these three encoding schemes to build the proposed model DPI_CDF.
Results: The empirical outcomes on 10-fold cross-validation demonstrate that the proposed model achieved 99.13 % accuracy and 0.982 of Matthew's-correlation-coefficient (MCC) on the training dataset. The generalization power of the trained model is further examined on an independent dataset and achieved 95.01% of maximum accuracy and 0.900 MCC. When compared to current state-of-the-art methods, DPI_CDF improves in terms of accuracy by 4.27% and 4.31% on training and testing datasets, respectively. We believe, DPI_CDF will support the research community to identify druggable proteins and escalate the drug discovery process.
Availability: The benchmark datasets and source codes are available in GitHub: http://github.com/Muhammad-Arif-NUST/DPI_CDF .
Shoombuatong W, Mookdarsanit P, Mookdarsanit L, Schaduangrat N, Ahmed S, Kabir M Interdiscip Sci. 2025; .
PMID: 40067411 DOI: 10.1007/s12539-025-00696-5.
Shoombuatong W, Schaduangrat N, Homdee N, Ahmed S, Chumnanpuen P Sci Rep. 2025; 15(1):4762.
PMID: 39922825 PMC: 11807091. DOI: 10.1038/s41598-024-81807-y.
Ghulam A, Arif M, Unar A, A Thafar M, Albaradei S, Worachartcheewan A IET Syst Biol. 2025; 19(1):e70002.
PMID: 39905861 PMC: 11794993. DOI: 10.1049/syb2.70002.
Ullah M, Akbar S, Raza A, Khan K, Zou Q Brief Bioinform. 2025; 26(1.
PMID: 39844339 PMC: 11753890. DOI: 10.1093/bib/bbaf026.
Noor S, Naseem A, Awan H, Aslam W, Khan S, AlQahtani S BMC Bioinformatics. 2024; 25(1):360.
PMID: 39563239 PMC: 11577875. DOI: 10.1186/s12859-024-05978-1.