» Articles » PMID: 17238260

Random Forest Models to Predict Aqueous Solubility

Overview
Date 2007 Jan 24
PMID 17238260
Citations 53
Authors
Affiliations
Soon will be listed here.
Abstract

Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueous solubility more accurately than those created by PLS, SVM, and ANN and offered methods for automatic descriptor selection, an assessment of descriptor importance, and an in-parallel measure of predictive ability, all of which serve to recommend its use. The prediction of log molar solubility for an external test set of 330 molecules that are solid at 25 degrees C gave an r2 = 0.89 and RMSE = 0.69 log S units. For a standard data set selected from the literature, the model performed well with respect to other documented methods. Finally, the diversity of the training and test sets are compared to the chemical space occupied by molecules in the MDL drug data report, on the basis of molecular descriptors selected by the regression analysis.

Citing Articles

Development of Predictive Statistical Model for Gaining Valuable Insights in Pharmaceutical Product Recalls.

Bhatt J, Morris K, Haware R AAPS PharmSciTech. 2024; 25(8):255.

PMID: 39443361 DOI: 10.1208/s12249-024-02970-z.


Determination of morphine sulfate anti-pain drug solubility in supercritical CO with machine learning method.

Sodeifian G, Hsieh C, Masihpour F, Tabibzadeh A, Jiang R, Cheng Y Sci Rep. 2024; 14(1):22370.

PMID: 39333248 PMC: 11437171. DOI: 10.1038/s41598-024-73543-0.


Transfer learning and wavelength selection method in NIR spectroscopy to predict glucose and lactate concentrations in culture media using VIP-Boruta.

Kaneko H, Kono S, Nojima A, Kambayashi T Anal Sci Adv. 2024; 2(9-10):470-479.

PMID: 38716444 PMC: 10989590. DOI: 10.1002/ansa.202000177.


Cross-validated permutation feature importance considering correlation between features.

Kaneko H Anal Sci Adv. 2024; 3(9-10):278-287.

PMID: 38716264 PMC: 10989554. DOI: 10.1002/ansa.202200018.


Estimation and visualization of process states using latent variable models based on Gaussian process.

Kaneko H Anal Sci Adv. 2024; 2(5-6):326-333.

PMID: 38716160 PMC: 10989668. DOI: 10.1002/ansa.202000122.