Imputation of Missing Longitudinal Data: a Comparison of Methods

Overview

Journal J Clin Epidemiol

Publisher Elsevier

Specialty Public Health

Date 2003 Oct 22

PMID 14568628

Citations 132

Authors

Jean Mundahl Engels

Paula Diehr

Affiliations

Soon will be listed here.

Abstract

Background And Objectives: Missing information is inevitable in longitudinal studies, and can result in biased estimates and a loss of power. One approach to this problem is to impute the missing data to yield a more complete data set. Our goal was to compare the performance of 14 methods of imputing missing data on depression, weight, cognitive functioning, and self-rated health in a longitudinal cohort of older adults.

Methods: We identified situations where a person had a known value following one or more missing values, and treated the known value as a "missing value." This "missing value" was imputed using each method and compared to the observed value. Methods were compared on the root mean square error, mean absolute deviation, bias, and relative variance of the estimates.

Results: Most imputation methods were biased toward estimating the "missing value" as too healthy, and most estimates had a variance that was too low. Imputed values based on a person's values before and after the "missing value" were superior to other methods, followed by imputations based on a person's values before the "missing value." Imputations that used no information specific to the person, such as using the sample mean, had the worst performance.

Conclusions: We conclude that, in longitudinal studies where the overall trend is for worse health over time and where missing data can be assumed to be primarily related to worse health, missing data in a longitudinal sequence should be imputed from the available longitudinal data for that person.

Citing Articles

Handling missing values in patient-reported outcome data in the presence of intercurrent events.

Thomassen D, Roychoudhury S, Amdal C, Reynders D, Musoro J, Sauerbrei W BMC Med Res Methodol. 2025; 25(1):56.

PMID: 40025441 PMC: 11872335. DOI: 10.1186/s12874-025-02510-8.

Challenge of missing data in observational studies: investigating cross-sectional imputation methods for assessing disease activity in axial spondyloarthritis.

Georgiadis S, Pons M, Rasmussen S, Hetland M, Linde L, Di Giuseppe D RMD Open. 2025; 11(1).

PMID: 39979039 PMC: 11843021. DOI: 10.1136/rmdopen-2024-004844.

A roadmap to precision medicine through post-genomic electronic medical records.

Mendez K, Reinke S, Kelly R, Chen Q, Su M, McGeachie M Nat Commun. 2025; 16(1):1700.

PMID: 39962039 PMC: 11833060. DOI: 10.1038/s41467-025-56442-4.

How much missing data is too much to impute for longitudinal health indicators? A preliminary guideline for the choice of the extent of missing proportion to impute with multiple imputation by chained equations.

Junaid K, Kiran T, Gupta M, Kishore K, Siwatch S Popul Health Metr. 2025; 23(1):2.

PMID: 39893454 PMC: 11787761. DOI: 10.1186/s12963-025-00364-2.

Prediction of adolescent depression from prenatal and childhood data from ALSPAC using machine learning.

Yoo A, Li F, Youn J, Guan J, Guyer A, Hostinar C Sci Rep. 2024; 14(1):23282.

PMID: 39375420 PMC: 11458604. DOI: 10.1038/s41598-024-72158-9.