Minimum Sample Size for External Validation of a Clinical Prediction Model with a Binary Outcome

Overview

Journal Stat Med

Publisher Wiley

Specialty Public Health

Date 2021 May 25

PMID 34031906

Citations 79

Authors

Richard D Riley

Thomas P A Debray

Gary S Collins

Lucinda Archer

Joie Ensor

Maarten van Smeden

Kym I E Snell

Affiliations

Soon will be listed here.

Abstract

In prediction model research, external validation is needed to examine an existing model's performance using data independent to that for model development. Current external validation studies often suffer from small sample sizes and consequently imprecise predictive performance estimates. To address this, we propose how to determine the minimum sample size needed for a new external validation study of a prediction model for a binary outcome. Our calculations aim to precisely estimate calibration (Observed/Expected and calibration slope), discrimination (C-statistic), and clinical utility (net benefit). For each measure, we propose closed-form and iterative solutions for calculating the minimum sample size required. These require specifying: (i) target SEs (confidence interval widths) for each estimate of interest, (ii) the anticipated outcome event proportion in the validation population, (iii) the prediction model's anticipated (mis)calibration and variance of linear predictor values in the validation population, and (iv) potential risk thresholds for clinical decision-making. The calculations can also be used to inform whether the sample size of an existing (already collected) dataset is adequate for external validation. We illustrate our proposal for external validation of a prediction model for mechanical heart valve failure with an expected outcome event proportion of 0.018. Calculations suggest at least 9835 participants (177 events) are required to precisely estimate the calibration and discrimination measures, with this number driven by the calibration slope criterion, which we anticipate will often be the case. Also, 6443 participants (116 events) are required to precisely estimate net benefit at a risk threshold of 8%. Software code is provided.

Citing Articles

Development and validation of a nomogram for predicting postoperative pulmonary complications in older patients undergoing noncardiac thoracic surgery: a prospective, bicentric cohort study.

Zhou Y, Wang H, Lu D, Jiang T, Huang Z, Wang F BMC Geriatr. 2025; 25(1):169.

PMID: 40082767 PMC: 11905546. DOI: 10.1186/s12877-025-05791-2.

Complete Blood Count and Monocyte Distribution Width-Based Machine Learning Algorithms for Sepsis Detection: Multicentric Development and External Validation Study.

Campagner A, Agnello L, Carobene A, Padoan A, Del Ben F, Locatelli M J Med Internet Res. 2025; 27:e55492.

PMID: 40009841 PMC: 11904381. DOI: 10.2196/55492.

Externally validated and clinically useful machine learning algorithms to support patient-related decision-making in oncology: a scoping review.

Santos C, Amorim-Lopes M BMC Med Res Methodol. 2025; 25(1):45.

PMID: 39984835 PMC: 11843972. DOI: 10.1186/s12874-025-02463-y.

Predictive performance of prehospital trauma triage tools for resuscitative interventions within 24 hours in high-risk or life-threatening prehospital trauma patients.

Jenpanitpong C, Yuksen C, Trakulsrichai S, Sricharoen P, Leela-Amornsin S, Savatmongkorngul S BMC Emerg Med. 2025; 25(1):26.

PMID: 39979975 PMC: 11841352. DOI: 10.1186/s12873-025-01188-x.

Bidirectional cohort study protocol to construct and validate a prediction model for perioperative pulmonary complications in elderly hip fracture patients.

Wang L, Tian Y, Shen J, Fan X, Dong X, Chen J Sci Rep. 2025; 15(1):6097.

PMID: 39971947 PMC: 11840002. DOI: 10.1038/s41598-025-89037-6.