» Articles » PMID: 37429511

Deep Imputation of Missing Values in Time Series Health Data: A Review with Benchmarking

Overview
Journal J Biomed Inform
Publisher Elsevier
Date 2023 Jul 10
PMID 37429511
Authors
Affiliations
Soon will be listed here.
Abstract

The imputation of missing values in multivariate time series (MTS) data is critical in ensuring data quality and producing reliable data-driven predictive models. Apart from many statistical approaches, a few recent studies have proposed state-of-the-art deep learning methods to impute missing values in MTS data. However, the evaluation of these deep methods is limited to one or two data sets, low missing rates, and completely random missing value types. This survey performs six data-centric experiments to benchmark state-of-the-art deep imputation methods on five time series health data sets. Our extensive analysis reveals that no single imputation method outperforms the others on all five data sets. The imputation performance depends on data types, individual variable statistics, missing value rates, and types. Deep learning methods that jointly perform cross-sectional (across variables) and longitudinal (across time) imputations of missing values in time series data yield statistically better data quality than traditional imputation methods. Although computationally expensive, deep learning methods are practical given the current availability of high-performance computing resources, especially when data quality and sample size are of paramount importance in healthcare informatics. Our findings highlight the importance of data-centric selection of imputation methods to optimize data-driven predictive models.

Citing Articles

Data-driven ergonomic risk assessment of complex hand-intensive manufacturing processes.

Krishnan A, Yang X, Seth U, Jeyachandran J, Ahn J, Gardner R Commun Eng. 2025; 4(1):45.

PMID: 40075152 PMC: 11903948. DOI: 10.1038/s44172-025-00382-w.


Predicting rapid decline in kidney function among type 2 diabetes patients: A machine learning approach.

Nakahara E, Waki K, Kurasawa H, Mimura I, Seki T, Fujino A Heliyon. 2025; 11(1):e40566.

PMID: 39807510 PMC: 11728953. DOI: 10.1016/j.heliyon.2024.e40566.


Moving Beyond Medical Statistics: A Systematic Review on Missing Data Handling in Electronic Health Records.

Ren W, Liu Z, Wu Y, Zhang Z, Hong S, Liu H Health Data Sci. 2024; 4:0176.

PMID: 39635227 PMC: 11615160. DOI: 10.34133/hds.0176.


Forecasting the trend of tuberculosis incidence in Anhui Province based on machine learning optimization algorithm, 2013-2023.

Zhang Y, Ma H, Wang H, Xia Q, Wu S, Meng J BMC Pulm Med. 2024; 24(1):536.

PMID: 39462337 PMC: 11520048. DOI: 10.1186/s12890-024-03296-z.


Attention-based Imputation of Missing Values in Electronic Health Records Tabular Data.

Kowsar I, Rabbani S, Samad M Proc (IEEE Int Conf Healthc Inform). 2024; 2024:177-182.

PMID: 39387063 PMC: 11463999. DOI: 10.1109/ichi61247.2024.00030.


References
1.
Makary M, Daniel M . Medical error-the third leading cause of death in the US. BMJ. 2016; 353:i2139. DOI: 10.1136/bmj.i2139. View

2.
Zhang X, Yan C, Gao C, Malin B, Chen Y . Predicting Missing Values in Medical Data via XGBoost Regression. J Healthc Inform Res. 2020; 4(4):383-394. PMC: 7709926. DOI: 10.1007/s41666-020-00077-1. View

3.
Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov P, Mark R . PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. 2000; 101(23):E215-20. DOI: 10.1161/01.cir.101.23.e215. View

4.
Samad M, Abrar S, Diawara N . Missing Value Estimation using Clustering and Deep Learning within Multiple Imputation Framework. Knowl Based Syst. 2022; 249. PMC: 9503087. DOI: 10.1016/j.knosys.2022.108968. View

5.
Zhou S, Zeng Z, Wei H, Sha T, An S . Early combination of albumin with crystalloids administration might be beneficial for the survival of septic patients: a retrospective analysis from MIMIC-IV database. Ann Intensive Care. 2021; 11(1):42. PMC: 7947075. DOI: 10.1186/s13613-021-00830-8. View