» Articles » PMID: 23242535

Understanding Increments in Model Performance Metrics

Overview
Publisher Springer
Date 2012 Dec 18
PMID 23242535
Citations 18
Authors
Affiliations
Soon will be listed here.
Abstract

The area under the receiver operating characteristic curve (AUC) is the most commonly reported measure of discrimination for prediction models with binary outcomes. However, recently it has been criticized for its inability to increase when important risk factors are added to a baseline model with good discrimination. This has led to the claim that the reliance on the AUC as a measure of discrimination may miss important improvements in clinical performance of risk prediction rules derived from a baseline model. In this paper we investigate this claim by relating the AUC to measures of clinical performance based on sensitivity and specificity under the assumption of multivariate normality. The behavior of the AUC is contrasted with that of discrimination slope. We show that unless rules with very good specificity are desired, the change in the AUC does an adequate job as a predictor of the change in measures of clinical performance. However, stronger or more numerous predictors are needed to achieve the same increment in the AUC for baseline models with good versus poor discrimination. When excellent specificity is desired, our results suggest that the discrimination slope might be a better measure of model improvement than AUC. The theoretical results are illustrated using a Framingham Heart Study example of a model for predicting the 10-year incidence of atrial fibrillation.

Citing Articles

Prognostic accuracy of 70 individual frailty biomarkers in predicting mortality in the Canadian Longitudinal Study on Aging.

Blodgett J, Perez-Zepeda M, Godin J, Kehler D, Andrew M, Kirkland S Geroscience. 2024; 46(3):3061-3069.

PMID: 38182858 PMC: 11009196. DOI: 10.1007/s11357-023-01055-2.


Minimum sample size for developing a multivariable prediction model using multinomial logistic regression.

Pate A, Riley R, Collins G, van Smeden M, Van Calster B, Ensor J Stat Methods Med Res. 2023; 32(3):555-571.

PMID: 36660777 PMC: 10012398. DOI: 10.1177/09622802231151220.


Polygenic risk scores for prediction of breast cancer in Korean women.

Jee Y, Ho W, Park S, Easton D, Teo S, Jung K Int J Epidemiol. 2022; 52(3):796-805.

PMID: 36343017 PMC: 10244045. DOI: 10.1093/ije/dyac206.


Predictive Utility of a Validated Polygenic Risk Score for Long-Term Risk of Coronary Heart Disease in Young and Middle-Aged Adults.

Khan S, Page C, Wojdyla D, Schwartz Y, Greenland P, Pencina M Circulation. 2022; 146(8):587-596.

PMID: 35880530 PMC: 9398962. DOI: 10.1161/CIRCULATIONAHA.121.058426.


Longitudinal validation of an electronic health record delirium prediction model applied at admission in COVID-19 patients.

Castro V, Hart K, Sacks C, Murphy S, Perlis R, McCoy Jr T Gen Hosp Psychiatry. 2021; 74:9-17.

PMID: 34798580 PMC: 8562039. DOI: 10.1016/j.genhosppsych.2021.10.005.


References
1.
Youden W . Index for rating diagnostic tests. Cancer. 1950; 3(1):32-5. DOI: 10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3. View

2.
Pencina M, DAgostino Sr R, DAgostino Jr R, Vasan R . Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2007; 27(2):157-72. DOI: 10.1002/sim.2929. View

3.
Ware J . The limitations of risk factors as prognostic tools. N Engl J Med. 2006; 355(25):2615-7. DOI: 10.1056/NEJMp068249. View

4.
Vickers A, Elkin E . Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006; 26(6):565-74. PMC: 2577036. DOI: 10.1177/0272989X06295361. View

5.
Baker S, Cook N, Vickers A, Kramer B . Using relative utility curves to evaluate risk prediction. J R Stat Soc Ser A Stat Soc. 2010; 172(4):729-748. PMC: 2804257. DOI: 10.1111/j.1467-985X.2009.00592.x. View