» Articles » PMID: 27483216

Bioactive Molecule Prediction Using Extreme Gradient Boosting

Overview
Journal Molecules
Publisher MDPI
Specialty Biology
Date 2016 Aug 3
PMID 27483216
Citations 52
Authors
Affiliations
Soon will be listed here.
Abstract

Following the explosive growth in chemical and biological data, the shift from traditional methods of drug discovery to computer-aided means has made data mining and machine learning methods integral parts of today's drug discovery process. In this paper, extreme gradient boosting (Xgboost), which is an ensemble of Classification and Regression Tree (CART) and a variant of the Gradient Boosting Machine, was investigated for the prediction of biological activity based on quantitative description of the compound's molecular structure. Seven datasets, well known in the literature were used in this paper and experimental results show that Xgboost can outperform machine learning algorithms like Random Forest (RF), Support Vector Machines (LSVM), Radial Basis Function Neural Network (RBFN) and Naïve Bayes (NB) for the prediction of biological activities. In addition to its ability to detect minority activity classes in highly imbalanced datasets, it showed remarkable performance on both high and low diversity datasets.

Citing Articles

Development of a machine learning model in prediction of the rapid progression of interstitial lung disease in patients with idiopathic inflammatory myopathy.

Qiang Y, Wang H, Ni Y, Wang J, Liu A, Yang H Quant Imaging Med Surg. 2024; 14(12):9258-9275.

PMID: 39698644 PMC: 11652001. DOI: 10.21037/qims-24-595.


Enhanced labor pain monitoring using machine learning and ECG waveform analysis for uterine contraction-induced pain.

Chu Y, Chen S, Chen K, Sun J, Shen T, Chen L BioData Min. 2024; 17(1):32.

PMID: 39243100 PMC: 11380346. DOI: 10.1186/s13040-024-00383-z.


Machine Learning-Based Personalized Prediction of Hepatocellular Carcinoma Recurrence After Radiofrequency Ablation.

Sato M, Tateishi R, Moriyama M, Fukumoto T, Yamada T, Nakagomi R Gastro Hep Adv. 2024; 1(1):29-37.

PMID: 39129938 PMC: 11308827. DOI: 10.1016/j.gastha.2021.09.003.


Subcellular Feature-Based Classification of α and β Cells Using Soft X-ray Tomography.

Deshmukh A, Chang K, Cuala J, Vanslembrouck B, Georgia S, Loconte V Cells. 2024; 13(10.

PMID: 38786091 PMC: 11119489. DOI: 10.3390/cells13100869.


Explainable AI for CHO cell culture media optimization and prediction of critical quality attribute.

Gangwar N, Balraj K, Rathore A Appl Microbiol Biotechnol. 2024; 108(1):308.

PMID: 38656382 PMC: 11043154. DOI: 10.1007/s00253-024-13147-w.


References
1.
Harper G, Bradshaw J, Gittins J, Green D, Leach A . Prediction of biological activity for high-throughput screening using binary kernel discrimination. J Chem Inf Comput Sci. 2001; 41(5):1295-300. DOI: 10.1021/ci000397q. View

2.
Kauffman G, Jurs P . QSAR and k-nearest neighbor classification analysis of selective cyclooxygenase-2 inhibitors using topologically-based numerical descriptors. J Chem Inf Comput Sci. 2001; 41(6):1553-60. DOI: 10.1021/ci010073h. View

3.
Sutherland J, OBrien L, Weaver D . Spline-fitting with a genetic algorithm: a method for developing classification structure-activity relationships. J Chem Inf Comput Sci. 2003; 43(6):1906-15. DOI: 10.1021/ci034143r. View

4.
Svetnik V, Liaw A, Tong C, Culberson J, Sheridan R, Feuston B . Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci. 2003; 43(6):1947-58. DOI: 10.1021/ci034160g. View

5.
Helma C, Cramer T, Kramer S, De Raedt L . Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. J Chem Inf Comput Sci. 2004; 44(4):1402-11. DOI: 10.1021/ci034254q. View