Flexible Boosting of Accelerated Failure Time Models
Overview
Authors
Affiliations
Background: When boosting algorithms are used for building survival models from high-dimensional data, it is common to fit a Cox proportional hazards model or to use least squares techniques for fitting semiparametric accelerated failure time models. There are cases, however, where fitting a fully parametric accelerated failure time model is a good alternative to these methods, especially when the proportional hazards assumption is not justified. Boosting algorithms for the estimation of parametric accelerated failure time models have not been developed so far, since these models require the estimation of a model-specific scale parameter which traditional boosting algorithms are not able to deal with.
Results: We introduce a new boosting algorithm for censored time-to-event data which is suitable for fitting parametric accelerated failure time models. Estimation of the predictor function is carried out simultaneously with the estimation of the scale parameter, so that the negative log likelihood of the survival distribution can be used as a loss function for the boosting algorithm. The estimation of the scale parameter does not affect the favorable properties of boosting with respect to variable selection.
Conclusion: The analysis of a high-dimensional set of microarray data demonstrates that the new algorithm is able to outperform boosting with the Cox partial likelihood when the proportional hazards assumption is questionable. In low-dimensional settings, i.e., when classical likelihood estimation of a parametric accelerated failure time model is possible, simulations show that the new boosting algorithm closely approximates the estimates obtained from the maximum likelihood method.
Genetic Prediction Modeling in Large Cohort Studies via Boosting Targeted Loss Functions.
Klinkhammer H, Staerk C, Maj C, Krawitz P, Mayr A Stat Med. 2024; 43(28):5412-5430.
PMID: 39440393 PMC: 11586906. DOI: 10.1002/sim.10249.
Tutorial on survival modeling with applications to omics data.
Zhao Z, Zobolas J, Zucknick M, Aittokallio T Bioinformatics. 2024; 40(3).
PMID: 38445722 PMC: 10973942. DOI: 10.1093/bioinformatics/btae132.
A boosting first-hitting-time model for survival analysis in high-dimensional settings.
De Bin R, Stikbakke V Lifetime Data Anal. 2022; 29(2):420-440.
PMID: 35476164 PMC: 10006065. DOI: 10.1007/s10985-022-09553-9.
BOOSTED NONPARAMETRIC HAZARDS WITH TIME-DEPENDENT COVARIATES.
Lee D, Chen N, Ishwaran H Ann Stat. 2021; 49(4):2101-2128.
PMID: 34937956 PMC: 8691747. DOI: 10.1214/20-aos2028.
Review of statistical methods for survival analysis using genomic data.
Lee S, Lim H Genomics Inform. 2020; 17(4):e41.
PMID: 31896241 PMC: 6944043. DOI: 10.5808/GI.2019.17.4.e41.