» Articles » PMID: 30630434

Multiple Imputation Methods for Handling Missing Values in a Longitudinal Categorical Variable with Restrictions on Transitions over Time: a Simulation Study

Overview
Publisher Biomed Central
Date 2019 Jan 12
PMID 30630434
Citations 15
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Longitudinal categorical variables are sometimes restricted in terms of how individuals transition between categories over time. For example, with a time-dependent measure of smoking categorised as never-smoker, ex-smoker, and current-smoker, current-smokers or ex-smokers cannot transition to a never-smoker at a subsequent wave. These longitudinal variables often contain missing values, however, there is little guidance on whether these restrictions need to be accommodated when using multiple imputation methods. Multiply imputing such missing values, ignoring the restrictions, could lead to implausible transitions.

Methods: We designed a simulation study based on the Longitudinal Study of Australian Children, where the target analysis was the association between (incomplete) maternal smoking and childhood obesity. We set varying proportions of data on maternal smoking to missing completely at random or missing at random. We compared the performance of fully conditional specification with multinomial and ordinal logistic imputation, and predictive mean matching, two-fold fully conditional specification, indicator based imputation under multivariate normal imputation with projected distance-based rounding, and continuous imputation under multivariate normal imputation with calibration, where each of these multiple imputation methods were applied, accounting for the restrictions using a semi-deterministic imputation procedure.

Results: Overall, we observed reduced bias when applying multiple imputation methods with restrictions, and fully conditional specification with predictive mean matching performed the best. Applying fully conditional specification and two-fold fully conditional specification for imputing nominal variables based on multinomial logistic regression had severe convergence issues. Both imputation methods under multivariate normal imputation produced biased estimates when restrictions were not accommodated, however, we observed substantial reductions in bias when restrictions were applied with continuous imputation under multivariate normal imputation with calibration.

Conclusion: In a similar longitudinal setting we recommend the use of fully conditional specification with predictive mean matching, with restrictions applied during the imputation stage.

Citing Articles

A machine learning analysis of patient and imaging factors associated with achieving clinically substantial outcome improvements following total shoulder arthroplasty: Implications for selecting anatomic or reverse prostheses.

Kunze K, Bobko A, Mathew J, Polce E, Manzi J, Nicholson A Shoulder Elbow. 2024; 16(4):382-389.

PMID: 39318416 PMC: 11418670. DOI: 10.1177/17585732231187124.


Handling missing data and measurement error for early-onset myopia risk prediction models.

Lai H, Gao K, Li M, Li T, Zhou X, Zhou X BMC Med Res Methodol. 2024; 24(1):194.

PMID: 39243025 PMC: 11378546. DOI: 10.1186/s12874-024-02319-x.


Demographic, health, physical activity, and workplace factors are associated with lower healthy working life expectancy and life expectancy at age 50.

Lynch M, Bucknall M, Jagger C, Kingston A, Wilkie R Sci Rep. 2024; 14(1):5936.

PMID: 38467680 PMC: 10928117. DOI: 10.1038/s41598-024-53095-z.


The Influence of Airborne Particulate Matter on the Risk of Gestational Diabetes Mellitus: A Large Retrospective Study in Chongqing, China.

Zeng X, Zhan Y, Zhou W, Qiu Z, Wang T, Chen Q Toxics. 2024; 12(1).

PMID: 38250975 PMC: 10818620. DOI: 10.3390/toxics12010019.


Early inflammatory markers as prognostic indicators following allogeneic stem cell transplantation.

Verma K, Croft W, Greenwood D, Stephens C, Malladi R, Nunnick J Front Immunol. 2024; 14:1332777.

PMID: 38235129 PMC: 10791949. DOI: 10.3389/fimmu.2023.1332777.


References
1.
Rezvan P, Lee K, Simpson J . The rise of multiple imputation: a review of the reporting and implementation of the method in medical research. BMC Med Res Methodol. 2015; 15:30. PMC: 4396150. DOI: 10.1186/s12874-015-0022-1. View

2.
Halfon N, Larson K, Lu M, Tullis E, Russ S . Lifecourse health development: past, present and future. Matern Child Health J. 2013; 18(2):344-65. PMC: 3890560. DOI: 10.1007/s10995-013-1346-2. View

3.
White I, Royston P, Wood A . Multiple imputation using chained equations: Issues and guidance for practice. Stat Med. 2011; 30(4):377-99. DOI: 10.1002/sim.4067. View

4.
Collins L, Schafer J, Kam C . A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods. 2002; 6(4):330-51. View

5.
Karahalios A, Baglietto L, Lee K, English D, Carlin J, Simpson J . The impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: a simulation study. Emerg Themes Epidemiol. 2013; 10(1):6. PMC: 3751092. DOI: 10.1186/1742-7622-10-6. View