» Articles » PMID: 39264957

The Impact of Data Imputation on Air Quality Prediction Problem

Overview
Journal PLoS One
Date 2024 Sep 12
PMID 39264957
Authors
Affiliations
Soon will be listed here.
Abstract

With rising environmental concerns, accurate air quality predictions have become paramount as they help in planning preventive measures and policies for potential health hazards and environmental problems caused by poor air quality. Most of the time, air quality data are time series data. However, due to various reasons, we often encounter missing values in datasets collected during data preparation and aggregation steps. The inability to analyze and handle missing data will significantly hinder the data analysis process. To address this issue, this paper offers an extensive review of air quality prediction and missing data imputation techniques for time series, particularly in relation to environmental challenges. In addition, we empirically assess eight imputation methods, including mean, median, kNNI, MICE, SAITS, BRITS, MRNN, and Transformer, to scrutinize their impact on air quality data. The evaluation is conducted using diverse air quality datasets gathered from numerous cities globally. Based on these evaluations, we offer practical recommendations for practitioners dealing with missing data in time series scenarios for environmental data.

References
1.
Festag S, Spreckelsen C . Medical multivariate time series imputation and forecasting based on a recurrent conditional Wasserstein GAN and attention. J Biomed Inform. 2023; 139:104320. DOI: 10.1016/j.jbi.2023.104320. View

2.
Ong B, Sugiura K, Zettsu K . Dynamically pre-trained deep recurrent neural networks using environmental monitoring data for predicting PM. Neural Comput Appl. 2016; 27:1553-1566. PMC: 4920860. DOI: 10.1007/s00521-015-1955-3. View

3.
Graham J . Missing data analysis: making it work in the real world. Annu Rev Psychol. 2008; 60:549-76. DOI: 10.1146/annurev.psych.58.110405.085530. View

4.
Lachin J . Fallacies of last observation carried forward analyses. Clin Trials. 2015; 13(2):161-8. PMC: 4785044. DOI: 10.1177/1740774515602688. View

5.
Stekhoven D, Buhlmann P . MissForest--non-parametric missing value imputation for mixed-type data. Bioinformatics. 2011; 28(1):112-8. DOI: 10.1093/bioinformatics/btr597. View