» Articles » PMID: 20010215

Assessing the Performance of Prediction Models: a Framework for Traditional and Novel Measures

Overview
Journal Epidemiology
Specialty Public Health
Date 2009 Dec 17
PMID 20010215
Citations 1798
Authors
Affiliations
Soon will be listed here.
Abstract

The performance of prediction models can be assessed using a variety of methods and metrics. Traditional measures for binary and survival outcomes include the Brier score to indicate overall model performance, the concordance (or c) statistic for discriminative ability (or area under the receiver operating characteristic [ROC] curve), and goodness-of-fit statistics for calibration.Several new measures have recently been proposed that can be seen as refinements of discrimination measures, including variants of the c statistic for survival, reclassification tables, net reclassification improvement (NRI), and integrated discrimination improvement (IDI). Moreover, decision-analytic measures have been proposed, including decision curves to plot the net benefit achieved by making decisions based on model predictions.We aimed to define the role of these relatively novel approaches in the evaluation of the performance of prediction models. For illustration, we present a case study of predicting the presence of residual tumor versus benign tissue in patients with testicular cancer (n = 544 for model development, n = 273 for external validation).We suggest that reporting discrimination and calibration will always be important for a prediction model. Decision-analytic measures should be reported if the predictive model is to be used for clinical decisions. Other measures of performance may be warranted in specific applications, such as reclassification metrics to gain insight into the value of adding a novel predictor to an established model.

Citing Articles

Risk Prediction Models for Sentinel Node Positivity in Melanoma: A Systematic Review and Meta-Analysis.

Ma B, Gandhi M, Czyz S, Jia J, Rankin B, Osman S JAMA Dermatol. 2025; .

PMID: 40072444 PMC: 11904803. DOI: 10.1001/jamadermatol.2025.0113.


International validation of a pre-transplant risk assessment tool for graft survival in pediatric kidney transplant recipients.

Oomen L, de Wall L, Tonshoff B, Krupka K, Harambat J, Hogan J Clin Kidney J. 2025; 18(3):sfaf031.

PMID: 40052169 PMC: 11883223. DOI: 10.1093/ckj/sfaf031.


Optimal spirometry thresholds for the prediction of chronic airflow obstruction: a multinational longitudinal study.

Lam A, Alhajri S, Potts J, Harrabi I, Anand M, Janson C ERJ Open Res. 2025; 11(2).

PMID: 40040898 PMC: 11873882. DOI: 10.1183/23120541.00624-2024.


Predictive performance of risk prediction models for lung cancer incidence in Western and Asian countries: a systematic review and meta-analysis.

Juang Y, Ang L, Seow W Sci Rep. 2025; 15(1):4259.

PMID: 40038330 PMC: 11880538. DOI: 10.1038/s41598-024-83875-6.


Development of Multimorbidity Indexes Based on Common Mental Health Conditions.

Kose J, Kesse-Guyot E, Duquenne P, Hercberg S, Galan P, Touvier M Int J Public Health. 2025; 70:1607952.

PMID: 40012814 PMC: 11859580. DOI: 10.3389/ijph.2025.1607952.


References
1.
Gerds T, Schumacher M . Consistent estimation of the expected Brier score in general survival models with right-censored event times. Biom J. 2007; 48(6):1029-40. DOI: 10.1002/bimj.200610301. View

2.
Youden W . Index for rating diagnostic tests. Cancer. 1950; 3(1):32-5. DOI: 10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3. View

3.
Vickers A, Kramer B, Baker S . Selecting patients for randomized trials: a systematic approach based on risk group. Trials. 2006; 7:30. PMC: 1609186. DOI: 10.1186/1745-6215-7-30. View

4.
Miller M, Langefeld C, Tierney W, Hui S, McDonald C . Validation of probabilistic predictions. Med Decis Making. 1993; 13(1):49-58. DOI: 10.1177/0272989X9301300107. View

5.
Chambless L, Diao G . Estimation of time-dependent area under the ROC curve for long-term risk prediction. Stat Med. 2005; 25(20):3474-86. DOI: 10.1002/sim.2299. View