» Articles » PMID: 37045972

Differences in Learning Characteristics Between Support Vector Machine and Random Forest Models for Compound Classification Revealed by Shapley Value Analysis

Overview
Journal Sci Rep
Specialty Science
Date 2023 Apr 12
PMID 37045972
Authors
Affiliations
Soon will be listed here.
Abstract

The random forest (RF) and support vector machine (SVM) methods are mainstays in molecular machine learning (ML) and compound property prediction. We have explored in detail how binary classification models derived using these algorithms arrive at their predictions. To these ends, approaches from explainable artificial intelligence (XAI) are applicable such as the Shapley value concept originating from game theory that we adapted and further extended for our analysis. In large-scale activity-based compound classification using models derived from training sets of increasing size, RF and SVM with the Tanimoto kernel produced very similar predictions that could hardly be distinguished. However, Shapley value analysis revealed that their learning characteristics systematically differed and that chemically intuitive explanations of accurate RF and SVM predictions had different origins.

Citing Articles

GCN-Based Framework for Materials Screening and Phase Identification.

Qin Z, Luo Q, Qin W, Chen X, Zhang H, Wong C Materials (Basel). 2025; 18(5).

PMID: 40077185 PMC: 11901163. DOI: 10.3390/ma18050959.


Predicting antipsychotic responsiveness using a machine learning classifier trained on plasma levels of inflammatory markers in schizophrenia.

Yee J, Phua S, See Y, Andiappan A, Goh W, Lee J Transl Psychiatry. 2025; 15(1):51.

PMID: 39952924 PMC: 11828904. DOI: 10.1038/s41398-025-03264-z.


Improving the explainability of autoencoder factors for commodities through forecast-based Shapley values.

Cerqueti R, Iovanella A, Mattera R, Storani S Sci Rep. 2024; 14(1):19622.

PMID: 39179618 PMC: 11344066. DOI: 10.1038/s41598-024-70342-5.


A Machine Learning-Based Mortality Prediction Model for Patients with Chronic Hepatitis C Infection: An Exploratory Study.

Al Alawi A, Al Shuaili H, Al-Naamani K, Al Naamani Z, Al-Busafi S J Clin Med. 2024; 13(10).

PMID: 38792479 PMC: 11121813. DOI: 10.3390/jcm13102939.


Machine Learning Approaches Identify Chemical Features for Stage-Specific Antimalarial Compounds.

van Heerden A, Turon G, Duran-Frigola M, Pillay N, Birkholtz L ACS Omega. 2023; 8(46):43813-43826.

PMID: 38027377 PMC: 10666252. DOI: 10.1021/acsomega.3c05664.

References
1.
Baell J, Holloway G . New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J Med Chem. 2010; 53(7):2719-40. DOI: 10.1021/jm901137j. View

2.
Castelvecchi D . Can we open the black box of AI?. Nature. 2016; 538(7623):20-23. DOI: 10.1038/538020a. View

3.
Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G . Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. 2019; 18(6):463-477. PMC: 6552674. DOI: 10.1038/s41573-019-0024-5. View

4.
Baum Z, Yu X, Ayala P, Zhao Y, Watkins S, Zhou Q . Artificial Intelligence in Chemistry: Current Trends and Future Directions. J Chem Inf Model. 2021; 61(7):3197-3212. DOI: 10.1021/acs.jcim.1c00619. View

5.
Gunning D, Stefik M, Choi J, Miller T, Stumpf S, Yang G . XAI-Explainable artificial intelligence. Sci Robot. 2020; 4(37). DOI: 10.1126/scirobotics.aay7120. View