» Articles » PMID: 33430997

Comparison and Improvement of the Predictability and Interpretability with Ensemble Learning Models in QSPR Applications

Overview
Journal J Cheminform
Publisher Biomed Central
Specialty Chemistry
Date 2021 Jan 12
PMID 33430997
Citations 18
Authors
Affiliations
Soon will be listed here.
Abstract

Ensemble learning helps improve machine learning results by combining several models and allows the production of better predictive performance compared to a single model. It also benefits and accelerates the researches in quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR). With the growing number of ensemble learning models such as random forest, the effectiveness of QSAR/QSPR will be limited by the machine's inability to interpret the predictions to researchers. In fact, many implementations of ensemble learning models are able to quantify the overall magnitude of each feature. For example, feature importance allows us to assess the relative importance of features and to interpret the predictions. However, different ensemble learning methods or implementations may lead to different feature selections for interpretation. In this paper, we compared the predictability and interpretability of four typical well-established ensemble learning models (Random forest, extreme randomized trees, adaptive boosting and gradient boosting) for regression and binary classification modeling tasks. Then, the blending methods were built by summarizing four different ensemble learning methods. The blending method led to better performance and a unification interpretation by summarizing individual predictions from different learning models. The important features of two case studies which gave us some valuable information to compound properties were discussed in detail in this report. QSPR modeling with interpretable machine learning techniques can move the chemical design forward to work more efficiently, confirm hypothesis and establish knowledge for better results.

Citing Articles

Predictive model for abdominal liposuction volume in patients with obesity using machine learning in a longitudinal multi-center study in Korea.

Sang H, Park J, Kim S, Lee M, Lee H, Lee S Sci Rep. 2024; 14(1):29791.

PMID: 39616163 PMC: 11608244. DOI: 10.1038/s41598-024-79654-y.


Application of hybridized ensemble learning and equilibrium optimization in estimating damping ratios of municipal solid waste.

Moghaddam H, Keramati M, Bahrami A, Ghanizadeh A, Amlashi A, Isleem H Sci Rep. 2024; 14(1):17584.

PMID: 39080333 PMC: 11289416. DOI: 10.1038/s41598-024-67381-3.


Mining Bovine Milk Proteins for DPP-4 Inhibitory Peptides Using Machine Learning and Virtual Proteolysis.

Zhang Y, Zhu Y, Bao X, Dai Z, Shen Q, Wang L Research (Wash D C). 2024; 7:0391.

PMID: 38887277 PMC: 11182572. DOI: 10.34133/research.0391.


Designing Sustainable Hydrophilic Interfaces via Feature Selection from Molecular Descriptors and Time-Domain Nuclear Magnetic Resonance Relaxation Curves.

Okada M, Amamoto Y, Kikuchi J Polymers (Basel). 2024; 16(6).

PMID: 38543429 PMC: 10975876. DOI: 10.3390/polym16060824.


Research on predicting the driving forces of digital transformation in Chinese media companies based on machine learning.

Wang Z, Li Y, Zhao X, Wang Y, Xiao Z Sci Rep. 2024; 14(1):7286.

PMID: 38538765 PMC: 10973445. DOI: 10.1038/s41598-024-57873-7.


References
1.
Marini A, Munoz-Losa A, Biancardi A, Mennucci B . What is solvatochromism?. J Phys Chem B. 2010; 114(51):17128-35. DOI: 10.1021/jp1097487. View

2.
Polishchuk P, Muratov E, Artemenko A, Kolumbin O, Muratov N, Kuzmin V . Application of random forest approach to QSAR prediction of aquatic toxicity. J Chem Inf Model. 2009; 49(11):2481-8. DOI: 10.1021/ci900203n. View

3.
Weber G, Farris F . Synthesis and spectral properties of a hydrophobic fluorescent probe: 6-propionyl-2-(dimethylamino)naphthalene. Biochemistry. 1979; 18(14):3075-8. DOI: 10.1021/bi00581a025. View

4.
Raccuglia P, Elbert K, Adler P, Falk C, Wenny M, Mollo A . Machine-learning-assisted materials discovery using failed experiments. Nature. 2016; 533(7601):73-6. DOI: 10.1038/nature17439. View

5.
Chen C, Tanaka K, Funatsu K . Random Forest Approach to QSPR Study of Fluorescence Properties Combining Quantum Chemical Descriptors and Solvent Conditions. J Fluoresc. 2018; 28(2):695-706. DOI: 10.1007/s10895-018-2233-4. View