» Articles » PMID: 16980151

Imputation of Missing Values is Superior to Complete Case Analysis and the Missing-indicator Method in Multivariable Diagnostic Research: a Clinical Example

Overview
Publisher Elsevier
Specialty Public Health
Date 2006 Sep 19
PMID 16980151
Citations 195
Authors
Affiliations
Soon will be listed here.
Abstract

Background And Objectives: To illustrate the effects of different methods for handling missing data--complete case analysis, missing-indicator method, single imputation of unconditional and conditional mean, and multiple imputation (MI)--in the context of multivariable diagnostic research aiming to identify potential predictors (test results) that independently contribute to the prediction of disease presence or absence.

Methods: We used data from 398 subjects from a prospective study on the diagnosis of pulmonary embolism. Various diagnostic predictors or tests had (varying percentages of) missing values. Per method of handling these missing values, we fitted a diagnostic prediction model using multivariable logistic regression analysis.

Results: The receiver operating characteristic curve area for all diagnostic models was above 0.75. The predictors in the final models based on the complete case analysis, and after using the missing-indicator method, were very different compared to the other models. The models based on MI did not differ much from the models derived after using single conditional and unconditional mean imputation.

Conclusion: In multivariable diagnostic research complete case analysis and the use of the missing-indicator method should be avoided, even when data are missing completely at random. MI methods are known to be superior to single imputation methods. For our example study, the single imputation methods performed equally well, but this was most likely because of the low overall number of missing values.

Citing Articles

Overcoming Missing Data: Accurately Predicting Cardiovascular Risk in Type 2 Diabetes, A Systematic Review.

Ren W, Fan K, Liu Z, Wu Y, An H, Liu H J Diabetes. 2025; 17(1):e70049.

PMID: 39843976 PMC: 11753920. DOI: 10.1111/1753-0407.70049.


Validating Machine Learning Models Against the Saline Test Gold Standard for Primary Aldosteronism Diagnosis.

Liu J, Huang W, Hu J, Hong N, Rhee Y, Li Q JACC Asia. 2025; 4(12):972-984.

PMID: 39802987 PMC: 11712017. DOI: 10.1016/j.jacasi.2024.09.010.


Implementing multiple imputations for addressing missing data in multireader multicase design studies.

Pan Z, Qin Y, Bai W, He Q, Yin X, He J BMC Med Res Methodol. 2024; 24(1):217.

PMID: 39333923 PMC: 11428558. DOI: 10.1186/s12874-024-02321-3.


Handling missing data and measurement error for early-onset myopia risk prediction models.

Lai H, Gao K, Li M, Li T, Zhou X, Zhou X BMC Med Res Methodol. 2024; 24(1):194.

PMID: 39243025 PMC: 11378546. DOI: 10.1186/s12874-024-02319-x.


Prediction model of deep vein thrombosis risk after lower extremity orthopedic surgery.

Zhang J, Shao Y, Zhou H, Li R, Xu J, Xiao Z Heliyon. 2024; 10(9):e29517.

PMID: 38720714 PMC: 11076659. DOI: 10.1016/j.heliyon.2024.e29517.