» Articles » PMID: 27586051

The Parameter Sensitivity of Random Forests

Overview
Publisher Biomed Central
Specialty Biology
Date 2016 Sep 3
PMID 27586051
Citations 24
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The Random Forest (RF) algorithm for supervised machine learning is an ensemble learning method widely used in science and many other fields. Its popularity has been increasing, but relatively few studies address the parameter selection process: a critical step in model fitting. Due to numerous assertions regarding the performance reliability of the default parameters, many RF models are fit using these values. However there has not yet been a thorough examination of the parameter-sensitivity of RFs in computational genomic studies. We address this gap here.

Results: We examined the effects of parameter selection on classification performance using the RF machine learning algorithm on two biological datasets with distinct p/n ratios: sequencing summary statistics (low p/n) and microarray-derived data (high p/n). Here, p, refers to the number of variables and, n, the number of samples. Our findings demonstrate that parameterization is highly correlated with prediction accuracy and variable importance measures (VIMs). Further, we demonstrate that different parameters are critical in tuning different datasets, and that parameter-optimization significantly enhances upon the default parameters.

Conclusions: Parameter performance demonstrated wide variability on both low and high p/n data. Therefore, there is significant benefit to be gained by model tuning RFs away from their default parameter settings.

Citing Articles

Estimation of elbow flexion torque from anthropometric and NMES MMG variables using random forest regression.

Uwamahoro R, Sundaraj K, Feroz F Sci Rep. 2025; 15(1):8038.

PMID: 40055347 PMC: 11889151. DOI: 10.1038/s41598-024-81504-w.


Clinical Response Characteristics of Salivary Proteins in the Management Strategy of Diabetes-Associated Periodontitis.

Jia S, Liang Q, Zhang Y, Diao J, Liu Y, Ye Y J Proteome Res. 2025; 24(3):1161-1179.

PMID: 40008981 PMC: 11895774. DOI: 10.1021/acs.jproteome.4c00701.


Dengue dynamics, predictions, and future increase under changing monsoon climate in India.

Sophia Y, Roxy M, Murtugudde R, Karipot A, Sapkota A, Dasgupta P Sci Rep. 2025; 15(1):1637.

PMID: 39837878 PMC: 11750985. DOI: 10.1038/s41598-025-85437-w.


A machine learning approach for modeling the occurrence of the major intermediate hosts for schistosomiasis in East Africa.

Tabo Z, Breuer L, Fabia C, Samuel G, Albrecht C Sci Rep. 2024; 14(1):4274.

PMID: 38383705 PMC: 10881506. DOI: 10.1038/s41598-024-54699-1.


Learning Financial Networks with High-frequency Trade Data.

Karpman K, Basu S, Easley D, Kim S Data Sci Sci. 2024; 2(1).

PMID: 38249160 PMC: 10798789. DOI: 10.1080/26941899.2023.2166624.


References
1.
Chen X, Liu M . Prediction of protein-protein interactions using random decision forest framework. Bioinformatics. 2005; 21(24):4394-400. DOI: 10.1093/bioinformatics/bti721. View

2.
Qi Y, Bar-Joseph Z, Klein-Seetharaman J . Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins. 2006; 63(3):490-500. PMC: 3250929. DOI: 10.1002/prot.20865. View

3.
Lin L . A concordance correlation coefficient to evaluate reproducibility. Biometrics. 1989; 45(1):255-68. View

4.
Kuhring M, Dabrowski P, Piro V, Nitsche A, Renard B . SuRankCo: supervised ranking of contigs in de novo assemblies. BMC Bioinformatics. 2015; 16:240. PMC: 4520199. DOI: 10.1186/s12859-015-0644-7. View

5.
Lee J, Lee K, Joung I, Joo K, Brooks B, Lee J . Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest. BMC Bioinformatics. 2015; 16:94. PMC: 4374281. DOI: 10.1186/s12859-015-0526-z. View