External Validation of New Risk Prediction Models is Infrequent and Reveals Worse Prognostic Discrimination

Overview

Journal J Clin Epidemiol

Publisher Elsevier

Specialty Public Health

Date 2014 Dec 3

PMID 25441703

Citations 174

Authors

George C M Siontis

Ioanna Tzoulaki

Peter J Castaldi

John P A Ioannidis

Affiliations

Soon will be listed here.

Abstract

Objectives: To evaluate how often newly developed risk prediction models undergo external validation and how well they perform in such validations.

Study Design And Setting: We reviewed derivation studies of newly proposed risk models and their subsequent external validations. Study characteristics, outcome(s), and models' discriminatory performance [area under the curve, (AUC)] in derivation and validation studies were extracted. We estimated the probability of having a validation, change in discriminatory performance with more stringent external validation by overlapping or different authors compared to the derivation estimates.

Results: We evaluated 127 new prediction models. Of those, for 32 models (25%), at least an external validation study was identified; in 22 models (17%), the validation had been done by entirely different authors. The probability of having an external validation by different authors within 5 years was 16%. AUC estimates significantly decreased during external validation vs. the derivation study [median AUC change: -0.05 (P < 0.001) overall; -0.04 (P = 0.009) for validation by overlapping authors; -0.05 (P < 0.001) for validation by different authors]. On external validation, AUC decreased by at least 0.03 in 19 models and never increased by at least 0.03 (P < 0.001).

Conclusion: External independent validation of predictive models in different studies is uncommon. Predictive performance may worsen substantially on external validation.

Citing Articles

Predictive performance of risk prediction models for lung cancer incidence in Western and Asian countries: a systematic review and meta-analysis.

Juang Y, Ang L, Seow W Sci Rep. 2025; 15(1):4259.

PMID: 40038330 PMC: 11880538. DOI: 10.1038/s41598-024-83875-6.

The diagnostic and prognostic capability of artificial intelligence in spinal cord injury: A systematic review.

Gill S, Subbiah Ponniah H, Giersztein S, Anantharaj R, Namireddy S, Killilea J Brain Spine. 2025; 5:104208.

PMID: 40027293 PMC: 11871462. DOI: 10.1016/j.bas.2025.104208.

A deep learning model for clinical outcome prediction using longitudinal inpatient electronic health records.

Rong R, Gu Z, Lai H, Nelson T, Keller T, Walker C medRxiv. 2025; .

PMID: 39974062 PMC: 11838940. DOI: 10.1101/2025.01.21.25320916.

Prediction Tools in Spine Surgery: A Narrative Review.

Jadresic M, Baker J Spine Surg Relat Res. 2025; 9(1):1-10.

PMID: 39935977 PMC: 11808232. DOI: 10.22603/ssrr.2024-0189.

Instability of the AUROC of Clinical Prediction Models.

van Leeuwen F, Steyerberg E, van Klaveren D, Wessler B, Kent D, van Zwet E Stat Med. 2025; 44(5):e70011.

PMID: 39921554 PMC: 11806515. DOI: 10.1002/sim.70011.