» Articles » PMID: 37540467

Solving the Many-variables Problem in MICE with Principal Component Regression

Overview
Publisher Springer
Specialty Social Sciences
Date 2023 Aug 4
PMID 37540467
Authors
Affiliations
Soon will be listed here.
Abstract

Multiple Imputation (MI) is one of the most popular approaches to addressing missing values in questionnaires and surveys. MI with multivariate imputation by chained equations (MICE) allows flexible imputation of many types of data. In MICE, for each variable under imputation, the imputer needs to specify which variables should act as predictors in the imputation model. The selection of these predictors is a difficult, but fundamental, step in the MI procedure, especially when there are many variables in a data set. In this project, we explore the use of principal component regression (PCR) as a univariate imputation method in the MICE algorithm to automatically address the many-variables problem that arises when imputing large social science data. We compare different implementations of PCR-based MICE with a correlation-thresholding strategy through two Monte Carlo simulation studies and a case study. We find the use of PCR on a variable-by-variable basis to perform best and that it can perform closely to expertly designed imputation procedures.

Citing Articles

Burden of prolonged treatment delay among patients with common cancers in the Philippines.

Cambia J, Wannasri A, Orlina E, Calvez G, Grafilo W, Liu J Cancer Causes Control. 2025; .

PMID: 39992497 DOI: 10.1007/s10552-025-01969-6.


A novel MissForest-based missing values imputation approach with recursive feature elimination in medical applications.

Hu Y, Wu R, Lin Y, Lin T BMC Med Res Methodol. 2024; 24(1):269.

PMID: 39516783 PMC: 11546113. DOI: 10.1186/s12874-024-02392-2.

References
1.
Eekhout I, de Vet H, de Boer M, Twisk J, Heymans M . Passive imputation and parcel summaries are both valid to handle missing items in studies with many multi-item scales. Stat Methods Med Res. 2016; 27(4):1128-1140. DOI: 10.1177/0962280216654511. View

2.
Enders C, Mistler S, Keller B . Multilevel multiple imputation: A review and evaluation of joint modeling and chained equations imputation. Psychol Methods. 2015; 21(2):222-40. DOI: 10.1037/met0000063. View

3.
Shah A, Bartlett J, Carpenter J, Nicholas O, Hemingway H . Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. Am J Epidemiol. 2014; 179(6):764-74. PMC: 3939843. DOI: 10.1093/aje/kwt312. View