» Articles » PMID: 37961281

Augmented Drug Combination Dataset to Improve the Performance of Machine Learning Models Predicting Synergistic Anticancer Effects

Overview
Journal Res Sq
Date 2023 Nov 14
PMID 37961281
Authors
Affiliations
Soon will be listed here.
Abstract

Combination therapy has gained popularity in cancer treatment as it enhances the treatment efficacy and overcomes drug resistance. Although machine learning (ML) techniques have become an indispensable tool for discovering new drug combinations, the data on drug combination therapy currently available may be insufficient to build high-precision models. We developed a data augmentation protocol to unbiasedly scale up the existing anti-cancer drug synergy dataset. Using a new drug similarity metric, we augmented the synergy data by substituting a compound in a drug combination instance with another molecule that exhibits highly similar pharmacological effects. Using this protocol, we were able to upscale the AZ-DREAM Challenges dataset from 8,798 to 6,016,697 drug combinations. Comprehensive performance evaluations show that Random Forest and Gradient Boosting Trees models trained on the augmented data achieve higher accuracy than those trained solely on the original dataset. Our data augmentation protocol provides a systematic and unbiased approach to generating more diverse and larger-scale drug combination datasets, enabling the development of more precise and effective ML models. The protocol presented in this study could serve as a foundation for future research aimed at discovering novel and effective drug combinations for cancer treatment.

References
1.
Eyobu O, Han D . Feature Representation and Data Augmentation for Human Activity Classification Based on Wearable IMU Sensor Data Using a Deep LSTM Neural Network. Sensors (Basel). 2018; 18(9). PMC: 6165524. DOI: 10.3390/s18092892. View

2.
Le Grand M, Berges R, Pasquier E, Montero M, Borge L, Carrier A . Akt targeting as a strategy to boost chemotherapy efficacy in non-small cell lung cancer through metabolism suppression. Sci Rep. 2017; 7:45136. PMC: 5362809. DOI: 10.1038/srep45136. View

3.
Mazandu G, Hooper C, Opap K, Makinde F, Nembaware V, Thomford N . IHP-PING-generating integrated human protein-protein interaction networks on-the-fly. Brief Bioinform. 2020; 22(4). PMC: 8293832. DOI: 10.1093/bib/bbaa277. View

4.
ONeil J, Benita Y, Feldman I, Chenard M, Roberts B, Liu Y . An Unbiased Oncology Compound Screen to Identify Novel Combination Strategies. Mol Cancer Ther. 2016; 15(6):1155-62. DOI: 10.1158/1535-7163.MCT-15-0843. View

5.
Zheng S, Aldahdooh J, Shadbahr T, Wang Y, Aldahdooh D, Bao J . DrugComb update: a more comprehensive drug sensitivity data repository and analysis portal. Nucleic Acids Res. 2021; 49(W1):W174-W184. PMC: 8218202. DOI: 10.1093/nar/gkab438. View