» Articles » PMID: 33276049

Missing Data in Clinical Research: A Tutorial on Multiple Imputation

Overview
Journal Can J Cardiol
Publisher Elsevier
Date 2020 Dec 4
PMID 33276049
Citations 215
Authors
Affiliations
Soon will be listed here.
Abstract

Missing data is a common occurrence in clinical research. Missing data occurs when the value of the variables of interest are not measured or recorded for all subjects in the sample. Common approaches to addressing the presence of missing data include complete-case analyses, where subjects with missing data are excluded, and mean-value imputation, where missing values are replaced with the mean value of that variable in those subjects for whom it is not missing. However, in many settings, these approaches can lead to biased estimates of statistics (eg, of regression coefficients) and/or confidence intervals that are artificially narrow. Multiple imputation (MI) is a popular approach for addressing the presence of missing data. With MI, multiple plausible values of a given variable are imputed or filled in for each subject who has missing data for that variable. This results in the creation of multiple completed data sets. Identical statistical analyses are conducted in each of these complete data sets and the results are pooled across complete data sets. We provide an introduction to MI and discuss issues in its implementation, including developing the imputation model, how many imputed data sets to create, and addressing derived variables. We illustrate the application of MI through an analysis of data on patients hospitalised with heart failure. We focus on developing a model to estimate the probability of 1-year mortality in the presence of missing data. Statistical software code for conducting MI in R, SAS, and Stata are provided.

Citing Articles

Relationship between plasma atherogenic index and incidence of cardiovascular diseases in Chinese middle-aged and elderly people.

Zhao M, Xiao M, Zhang H, Tan Q, Ji J, Cheng Y Sci Rep. 2025; 15(1):8775.

PMID: 40082452 PMC: 11906849. DOI: 10.1038/s41598-025-86213-6.


Nomogram for Predicting Early AVF Failure in Elderly Diabetic Patients: Methodological and Clinical Considerations [Letter].

Liu S, Tian K, Zhang Y Diabetes Metab Syndr Obes. 2025; 18:677-678.

PMID: 40041810 PMC: 11878112. DOI: 10.2147/DMSO.S521525.


Relationship between atherogenic index of plasma and length of stay in critically ill patients with atherosclerotic cardiovascular disease: a retrospective cohort study and predictive modeling based on machine learning.

Guo Y, Wang F, Ma S, Mao Z, Zhao S, Sui L Cardiovasc Diabetol. 2025; 24(1):95.

PMID: 40022165 PMC: 11871731. DOI: 10.1186/s12933-025-02654-3.


Impact of inappropriate empirical antibiotic therapy on in-hospital mortality: a retrospective multicentre cohort study of patients with bloodstream infections in Chile, 2018-2022.

Allel K, Peters A, Furuya-Kanamori L, Spencer-Sandino M, Pitchforth E, Yakob L BMJ Public Health. 2025; 2(2):e001289.

PMID: 40018577 PMC: 11816519. DOI: 10.1136/bmjph-2024-001289.


Frailty as an independent risk factor for sepsis-associated delirium: a cohort study of 11,740 older adult ICU patients.

Zheng G, Yan J, Li W, Chen Z Aging Clin Exp Res. 2025; 37(1):52.

PMID: 40011361 PMC: 11865144. DOI: 10.1007/s40520-025-02956-2.


References
1.
Schafer J, Graham J . Missing data: our view of the state of the art. Psychol Methods. 2002; 7(2):147-77. View

2.
White I, Royston P, Wood A . Multiple imputation using chained equations: Issues and guidance for practice. Stat Med. 2011; 30(4):377-99. DOI: 10.1002/sim.4067. View

3.
van Buuren S . Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res. 2007; 16(3):219-42. DOI: 10.1177/0962280206074463. View

4.
White I, Royston P . Imputing missing covariate values for the Cox model. Stat Med. 2009; 28(15):1982-98. PMC: 2998703. DOI: 10.1002/sim.3618. View

5.
Seaman S, Bartlett J, White I . Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods. BMC Med Res Methodol. 2012; 12:46. PMC: 3403931. DOI: 10.1186/1471-2288-12-46. View