A Comparison of the Conditional Inference Survival Forest Model to Random Survival Forests Based on a Simulation Study As Well As on Two Applications with Time-to-event Data

Overview

Journal BMC Med Res Methodol

Publisher Biomed Central

Specialties General Medicine
Health Services

Date 2017 Jul 30

PMID 28754093

Citations 32

Authors

Justine B Nasejje

Henry Mwambi

Keertan Dheda

Maia Lesosky

Affiliations

Soon will be listed here.

Abstract

Background: Random survival forest (RSF) models have been identified as alternative methods to the Cox proportional hazards model in analysing time-to-event data. These methods, however, have been criticised for the bias that results from favouring covariates with many split-points and hence conditional inference forests for time-to-event data have been suggested. Conditional inference forests (CIF) are known to correct the bias in RSF models by separating the procedure for the best covariate to split on from that of the best split point search for the selected covariate.

Methods: In this study, we compare the random survival forest model to the conditional inference model (CIF) using twenty-two simulated time-to-event datasets. We also analysed two real time-to-event datasets. The first dataset is based on the survival of children under-five years of age in Uganda and it consists of categorical covariates with most of them having more than two levels (many split-points). The second dataset is based on the survival of patients with extremely drug resistant tuberculosis (XDR TB) which consists of mainly categorical covariates with two levels (few split-points).

Results: The study findings indicate that the conditional inference forest model is superior to random survival forest models in analysing time-to-event data that consists of covariates with many split-points based on the values of the bootstrap cross-validated estimates for integrated Brier scores. However, conditional inference forests perform comparably similar to random survival forests models in analysing time-to-event data consisting of covariates with fewer split-points.

Conclusion: Although survival forests are promising methods in analysing time-to-event data, it is important to identify the best forest model for analysis based on the nature of covariates of the dataset in question.

Citing Articles

Combining dosiomics and machine learning methods for predicting severe cardiac diseases in childhood cancer survivors: the French Childhood Cancer Survivor Study.

Bentriou M, Letort V, Chounta S, Fresneau B, Do D, Haddy N Front Oncol. 2024; 14:1241221.

PMID: 39687880 PMC: 11647004. DOI: 10.3389/fonc.2024.1241221.

Use of biomarkers of metals to improve prediction performance of cardiovascular disease mortality.

Fansler S, Bakulski K, Park S, Walker E, Wang X Environ Health. 2024; 23(1):96.

PMID: 39511585 PMC: 11542438. DOI: 10.1186/s12940-024-01137-4.

Predicting survival benefits of immune checkpoint inhibitor therapy in lung cancer patients: a machine learning approach using real-world data.

Pan L, Mu L, Lei H, Miao S, Hu X, Tang Z Int J Clin Pharm. 2024; .

PMID: 39470981 DOI: 10.1007/s11096-024-01818-7.

Survival analysis in breast cancer: evaluating ensemble learning techniques for prediction.

Buyrukoglu G PeerJ Comput Sci. 2024; 10:e2147.

PMID: 39145224 PMC: 11323082. DOI: 10.7717/peerj-cs.2147.

CBioProfiler: A Web and Standalone Pipeline for Cancer Biomarker and Subtype Characterization.

Liu X, Wang Z, Shi H, Li S, Wang X Genomics Proteomics Bioinformatics. 2024; 22(3).

PMID: 38867700 PMC: 11464420. DOI: 10.1093/gpbjnl/qzae045.

References

Ngandu N . An empirical comparison of statistical tests for assessing the proportional hazards assumption of Cox's model. Stat Med. 1997; 16(6):611-26. DOI: 10.1002/(sici)1097-0258(19970330)16:6<611::aid-sim437>3.0.co;2-t. View

Pietersen E, Ignatius E, Streicher E, Mastrapa B, Padanilam X, Pooran A . Long-term outcomes of patients with extensively drug-resistant tuberculosis in South Africa: a cohort study. Lancet. 2014; 383(9924):1230-9. DOI: 10.1016/S0140-6736(13)62675-6. View

Ayiko R, Antai D, Kulane A . Trends and determinants of under-five mortality in Uganda. East Afr J Public Health. 2009; 6(2):136-40. View

Strobl C, Boulesteix A, Zeileis A, Hothorn T . Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics. 2007; 8:25. PMC: 1796903. DOI: 10.1186/1471-2105-8-25. View

Mogensen U, Ishwaran H, Gerds T . Evaluating Random Forests for Survival Analysis using Prediction Error Curves. J Stat Softw. 2014; 50(11):1-23. PMC: 4194196. DOI: 10.18637/jss.v050.i11. View

Das A, Abdel-Aty M, Pande A . Using conditional inference forests to identify the factors affecting crash severity on arterial corridors. J Safety Res. 2009; 40(4):317-27. DOI: 10.1016/j.jsr.2009.05.003. View

Wan F . Simulating survival data with predefined censoring rates for proportional hazards models. Stat Med. 2016; 36(5):838-854. DOI: 10.1002/sim.7178. View

Fisher L, Lin D . Time-dependent covariates in the Cox proportional-hazards regression model. Annu Rev Public Health. 1999; 20:145-57. DOI: 10.1146/annurev.publhealth.20.1.145. View

Song C, Zhang H . Comments on Fifty Years of Classification and Regression Trees. Int Stat Rev. 2015; 82(3):359-361. PMC: 4380222. DOI: 10.1111/insr.12060. View

10.

Platt R, Joseph K, Ananth C, Grondines J, Abrahamowicz M, Kramer M . A proportional hazards model with time-dependent covariates and time-varying effects for analysis of fetal and infant death. Am J Epidemiol. 2004; 160(3):199-206. DOI: 10.1093/aje/kwh201. View

11.

Nasejje J, Mwambi H, Achia T . Understanding the determinants of under-five child mortality in Uganda including the estimation of unobserved household and community effects using both frequentist and Bayesian survival analysis approaches. BMC Public Health. 2015; 15:1003. PMC: 4591593. DOI: 10.1186/s12889-015-2332-y. View

12.

Gordon L, Olshen R . Tree-structured survival analysis. Cancer Treat Rep. 1985; 69(10):1065-9. View

13.

Strobl C, Boulesteix A, Kneib T, Augustin T, Zeileis A . Conditional variable importance for random forests. BMC Bioinformatics. 2008; 9:307. PMC: 2491635. DOI: 10.1186/1471-2105-9-307. View

14.

Wei L . The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Stat Med. 1992; 11(14-15):1871-9. DOI: 10.1002/sim.4780111409. View

15.

Taylor J . Random Survival Forests. J Thorac Oncol. 2011; 6(12):1974-5. DOI: 10.1097/JTO.0b013e318233d835. View

16.

Moradian H, Larocque D, Bellavance F . L₁ splitting rules in survival forests. Lifetime Data Anal. 2016; 23(4):671-691. DOI: 10.1007/s10985-016-9372-1. View

17.

Crowther M, Lambert P . Simulating biologically plausible complex survival data. Stat Med. 2013; 32(23):4118-34. DOI: 10.1002/sim.5823. View

18.

Chen G, Kim S, Taylor J, Wang Z, Lee O, Ramnath N . Development and validation of a quantitative real-time polymerase chain reaction classifier for lung cancer prognosis. J Thorac Oncol. 2011; 6(9):1481-7. PMC: 3167380. DOI: 10.1097/JTO.0b013e31822918bd. View

19.

Wright M, Dankowski T, Ziegler A . Unbiased split variable selection for random survival forests using maximally selected rank statistics. Stat Med. 2017; 36(8):1272-1284. DOI: 10.1002/sim.7212. View

20.

Kim D, Kim H, Park S, Kong S, Kim Y, Kim T . Treatment outcomes and long-term survival in patients with extensively drug-resistant tuberculosis. Am J Respir Crit Care Med. 2008; 178(10):1075-82. DOI: 10.1164/rccm.200801-132OC. View