Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics

Overview

Journal Molecules

Publisher MDPI

Specialty Biology

Date 2019 Aug 4

PMID 31374986

Citations 32

Authors

Anita Racz

David Bajusz

Karoly Heberger

Affiliations

Soon will be listed here.

Abstract

Machine learning classification algorithms are widely used for the prediction and classification of the different properties of molecules such as toxicity or biological activity. the prediction of toxic vs. non-toxic molecules is important due to testing on living animals, which has ethical and cost drawbacks as well. The quality of classification models can be determined with several performance parameters. which often give conflicting results. In this study, we performed a multi-level comparison with the use of different performance metrics and machine learning classification methods. Well-established and standardized protocols for the machine learning tasks were used in each case. The comparison was applied to three datasets (acute and aquatic toxicities) and the robust, yet sensitive, sum of ranking differences (SRD) and analysis of variance (ANOVA) were applied for evaluation. The effect of dataset composition (balanced vs. imbalanced) and 2-class vs. multiclass classification scenarios was also studied. Most of the performance metrics are sensitive to dataset composition, especially in 2-class classification problems. The optimal machine learning algorithm also depends significantly on the composition of the dataset.

Citing Articles

Machine learning assisted classification RASAR modeling for the nephrotoxicity potential of a curated set of orally active drugs.

Banerjee A, Roy K Sci Rep. 2025; 15(1):808.

PMID: 39755865 PMC: 11700179. DOI: 10.1038/s41598-024-85063-y.

Machine learning algorithms able to predict the prognosis of gastric cancer patients treated with immune checkpoint inhibitors.

Li H, Zhu Z, Sun Y, Yuan C, Wang M, Wang N World J Gastroenterol. 2024; 30(40):4354-4366.

PMID: 39494097 PMC: 11525865. DOI: 10.3748/wjg.v30.i40.4354.

CSEL-BGC: A Bioinformatics Framework Integrating Machine Learning for Defining the Biosynthetic Evolutionary Landscape of Uncharacterized Antibacterial Natural Products.

Du M, Ren Y, Zhang Y, Li W, Yang H, Chu H Interdiscip Sci. 2024; 17(1):27-41.

PMID: 39348072 DOI: 10.1007/s12539-024-00656-5.

In Silico Exploration of Novel EGFR Kinase Mutant-Selective Inhibitors Using a Hybrid Computational Approach.

Noor M, Haq M, Chowdhury M, Tayara H, Shim H, Chong K Pharmaceuticals (Basel). 2024; 17(9).

PMID: 39338272 PMC: 11434943. DOI: 10.3390/ph17091107.

The application of chemical similarity measures in an unconventional modeling framework c-RASAR along with dimensionality reduction techniques to a representative hepatotoxicity dataset.

Banerjee A, Roy K Sci Rep. 2024; 14(1):20812.

PMID: 39242880 PMC: 11379871. DOI: 10.1038/s41598-024-71892-4.

References

Rudin C . Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nat Mach Intell. 2022; 1(5):206-215. PMC: 9122117. DOI: 10.1038/s42256-019-0048-x. View

Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G . Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. 2019; 18(6):463-477. PMC: 6552674. DOI: 10.1038/s41573-019-0024-5. View

Sheridan R, Singh S, Fluder E, Kearsley S . Protocols for bridging the peptide to nonpeptide gap in topological similarity searches. J Chem Inf Comput Sci. 2001; 41(5):1395-406. DOI: 10.1021/ci0100144. View

Nicholls A . Confidence limits, error bars and method comparison in molecular modeling. Part 1: the calculation of confidence intervals. J Comput Aided Mol Des. 2014; 28(9):887-918. PMC: 4175406. DOI: 10.1007/s10822-014-9753-z. View

Czodrowski P . Count on kappa. J Comput Aided Mol Des. 2014; 28(11):1049-55. DOI: 10.1007/s10822-014-9759-6. View

Kairys V, Fernandes M, Gilson M . Screening drug-like compounds by docking to homology models: a systematic study. J Chem Inf Model. 2006; 46(1):365-79. DOI: 10.1021/ci050238c. View

Racz A, Bajusz D, Heberger K . Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parameters. SAR QSAR Environ Res. 2015; 26(7-9):683-700. DOI: 10.1080/1062936X.2015.1084647. View

Truchon J, Bayly C . Evaluating virtual screening methods: good and bad metrics for the "early recognition" problem. J Chem Inf Model. 2007; 47(2):488-508. DOI: 10.1021/ci600426e. View

Piir G, Kahn I, Garcia-Sosa A, Sild S, Ahte P, Maran U . Best Practices for QSAR Model Reporting: Physical and Chemical Properties, Ecotoxicity, Environmental Fate, Human Health, and Toxicokinetics Endpoints. Environ Health Perspect. 2018; 126(12):126001. PMC: 6371683. DOI: 10.1289/EHP3264. View

10.

Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T . The rise of deep learning in drug discovery. Drug Discov Today. 2018; 23(6):1241-1250. DOI: 10.1016/j.drudis.2018.01.039. View

11.

Matthews B . Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta. 1975; 405(2):442-51. DOI: 10.1016/0005-2795(75)90109-9. View

12.

Racz A, Bajusz D, Heberger K . Modelling methods and cross-validation variants in QSAR: a multi-level analysis. SAR QSAR Environ Res. 2018; 29(9):661-674. DOI: 10.1080/1062936X.2018.1505778. View

13.

Bajusz D, Racz A, Heberger K . Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?. J Cheminform. 2015; 7:20. PMC: 4456712. DOI: 10.1186/s13321-015-0069-3. View

14.

Andric F, Bajusz D, Racz A, Segan S, Heberger K . Multivariate assessment of lipophilicity scales-computational and reversed phase thin-layer chromatographic indices. J Pharm Biomed Anal. 2016; 127:81-93. DOI: 10.1016/j.jpba.2016.04.001. View

15.

Racz A, Bajusz D, Heberger K . Intercorrelation Limits in Molecular Descriptor Preselection for QSAR/QSPR. Mol Inform. 2019; 38(8-9):e1800154. PMC: 6767540. DOI: 10.1002/minf.201800154. View