» Articles » PMID: 25817942

A Simulation Study of Sample Size Demonstrated the Importance of the Number of Events Per Variable to Develop Prediction Models in Clustered Data

Overview
Publisher Elsevier
Specialty Public Health
Date 2015 Mar 31
PMID 25817942
Citations 58
Authors
Affiliations
Soon will be listed here.
Abstract

Objectives: This study aims to investigate the influence of the amount of clustering [intraclass correlation (ICC) = 0%, 5%, or 20%], the number of events per variable (EPV) or candidate predictor (EPV = 5, 10, 20, or 50), and backward variable selection on the performance of prediction models.

Study Design And Setting: Researchers frequently combine data from several centers to develop clinical prediction models. In our simulation study, we developed models from clustered training data using multilevel logistic regression and validated them in external data.

Results: The amount of clustering was not meaningfully associated with the models' predictive performance. The median calibration slope of models built in samples with EPV = 5 and strong clustering (ICC = 20%) was 0.71. With EPV = 5 and ICC = 0%, it was 0.72. A higher EPV related to an increased performance: the calibration slope was 0.85 at EPV = 10 and ICC = 20% and 0.96 at EPV = 50 and ICC = 20%. Variable selection sometimes led to a substantial relative bias in the estimated predictor effects (up to 118% at EPV = 5), but this had little influence on the model's performance in our simulations.

Conclusion: We recommend at least 10 EPV to fit prediction models in clustered data using logistic regression. Up to 50 EPV may be needed when variable selection is performed.

Citing Articles

A new preprocedural predictive risk model for post-endoscopic retrograde cholangiopancreatography pancreatitis: The SuPER model.

Sugimoto M, Takagi T, Suzuki T, Shimizu H, Shibukawa G, Nakajima Y Elife. 2025; 13.

PMID: 39819489 PMC: 11741517. DOI: 10.7554/eLife.101604.


Development and Validation of a Risk Prediction Model for Sarcopenia in Chinese Older Patients with Type 2 Diabetes Mellitus.

Wang X, Gao S Diabetes Metab Syndr Obes. 2024; 17:4611-4626.

PMID: 39635500 PMC: 11616483. DOI: 10.2147/DMSO.S493903.


Prognostic risk prediction model for patients with acute exacerbation of chronic obstructive pulmonary disease (AECOPD): a systematic review and meta-analysis.

Xu Z, Li F, Xin Y, Wang Y, Wang Y Respir Res. 2024; 25(1):410.

PMID: 39543648 PMC: 11566839. DOI: 10.1186/s12931-024-03033-4.


Relationship between Respiratory Function and the Strength of the Abdominal Trunk Muscles Including the Diaphragm in Middle-Aged and Older Adult Patients.

Kurokawa Y, Kato S, Yokogawa N, Shimizu T, Matsubara H, Kabata T J Funct Morphol Kinesiol. 2024; 9(4).

PMID: 39449469 PMC: 11503391. DOI: 10.3390/jfmk9040175.


Establishment and validation of a prognostic model based on common laboratory indicators for SARS-CoV-2 infection in Chinese population.

Zhao A, Liu Y, Xia J, Huang L, Lu Q, Tang Q Ann Med. 2024; 56(1):2400312.

PMID: 39239874 PMC: 11382706. DOI: 10.1080/07853890.2024.2400312.