» Articles » PMID: 31374986

Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics

Overview
Journal Molecules
Publisher MDPI
Specialty Biology
Date 2019 Aug 4
PMID 31374986
Citations 32
Authors
Affiliations
Soon will be listed here.
Abstract

Machine learning classification algorithms are widely used for the prediction and classification of the different properties of molecules such as toxicity or biological activity. the prediction of toxic vs. non-toxic molecules is important due to testing on living animals, which has ethical and cost drawbacks as well. The quality of classification models can be determined with several performance parameters. which often give conflicting results. In this study, we performed a multi-level comparison with the use of different performance metrics and machine learning classification methods. Well-established and standardized protocols for the machine learning tasks were used in each case. The comparison was applied to three datasets (acute and aquatic toxicities) and the robust, yet sensitive, sum of ranking differences (SRD) and analysis of variance (ANOVA) were applied for evaluation. The effect of dataset composition (balanced vs. imbalanced) and 2-class vs. multiclass classification scenarios was also studied. Most of the performance metrics are sensitive to dataset composition, especially in 2-class classification problems. The optimal machine learning algorithm also depends significantly on the composition of the dataset.

Citing Articles

Machine learning assisted classification RASAR modeling for the nephrotoxicity potential of a curated set of orally active drugs.

Banerjee A, Roy K Sci Rep. 2025; 15(1):808.

PMID: 39755865 PMC: 11700179. DOI: 10.1038/s41598-024-85063-y.


Machine learning algorithms able to predict the prognosis of gastric cancer patients treated with immune checkpoint inhibitors.

Li H, Zhu Z, Sun Y, Yuan C, Wang M, Wang N World J Gastroenterol. 2024; 30(40):4354-4366.

PMID: 39494097 PMC: 11525865. DOI: 10.3748/wjg.v30.i40.4354.


CSEL-BGC: A Bioinformatics Framework Integrating Machine Learning for Defining the Biosynthetic Evolutionary Landscape of Uncharacterized Antibacterial Natural Products.

Du M, Ren Y, Zhang Y, Li W, Yang H, Chu H Interdiscip Sci. 2024; 17(1):27-41.

PMID: 39348072 DOI: 10.1007/s12539-024-00656-5.


In Silico Exploration of Novel EGFR Kinase Mutant-Selective Inhibitors Using a Hybrid Computational Approach.

Noor M, Haq M, Chowdhury M, Tayara H, Shim H, Chong K Pharmaceuticals (Basel). 2024; 17(9).

PMID: 39338272 PMC: 11434943. DOI: 10.3390/ph17091107.


The application of chemical similarity measures in an unconventional modeling framework c-RASAR along with dimensionality reduction techniques to a representative hepatotoxicity dataset.

Banerjee A, Roy K Sci Rep. 2024; 14(1):20812.

PMID: 39242880 PMC: 11379871. DOI: 10.1038/s41598-024-71892-4.


References
1.
Rudin C . Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nat Mach Intell. 2022; 1(5):206-215. PMC: 9122117. DOI: 10.1038/s42256-019-0048-x. View

2.
Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G . Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. 2019; 18(6):463-477. PMC: 6552674. DOI: 10.1038/s41573-019-0024-5. View

3.
Sheridan R, Singh S, Fluder E, Kearsley S . Protocols for bridging the peptide to nonpeptide gap in topological similarity searches. J Chem Inf Comput Sci. 2001; 41(5):1395-406. DOI: 10.1021/ci0100144. View

4.
Nicholls A . Confidence limits, error bars and method comparison in molecular modeling. Part 1: the calculation of confidence intervals. J Comput Aided Mol Des. 2014; 28(9):887-918. PMC: 4175406. DOI: 10.1007/s10822-014-9753-z. View

5.
Czodrowski P . Count on kappa. J Comput Aided Mol Des. 2014; 28(11):1049-55. DOI: 10.1007/s10822-014-9759-6. View