» Articles » PMID: 37907904

Constructing Synthetic Populations in the Age of Big Data

Overview
Publisher Biomed Central
Specialty Public Health
Date 2023 Nov 1
PMID 37907904
Authors
Affiliations
Soon will be listed here.
Abstract

Background: To develop public health intervention models using micro-simulations, extensive personal information about inhabitants is needed, such as socio-demographic, economic and health figures. Confidentiality is an essential characteristic of such data, while the data should reflect realistic scenarios. Collection of such data is possible only in secured environments and not directly available for open-source micro-simulation models. The aim of this paper is to illustrate a method of construction of synthetic data by predicting individual features through models based on confidential data on health and socio-economic determinants of the entire Dutch population.

Methods: Administrative records and health registry data were linked to socio-economic characteristics and self-reported lifestyle factors. For the entire Dutch population (n = 16,778,708), all socio-demographic information except lifestyle factors was available. Lifestyle factors were available from the 2012 Dutch Health Monitor (n = 370,835). Regression model was used to sequentially predict individual features.

Results: The synthetic population resembles the original confidential population. Features predicted in the first stages of the sequential procedure are virtually similar to those in the original population, while those predicted in later stages of the sequential procedure carry the accumulation of limitations furthered by data quality and previously modelled features.

Conclusions: By combining socio-demographic, economic, health and lifestyle related data at individual level on a large scale, our method provides us with a powerful tool to construct a synthetic population of good quality and with no confidentiality issues.

References
1.
Goryakin Y, Thiebaut S, Cortaredona S, Lerouge M, Cecchini M, Feigl A . Assessing the future medical cost burden for the European health systems under alternative exposure-to-risks scenarios. PLoS One. 2020; 15(9):e0238565. PMC: 7485835. DOI: 10.1371/journal.pone.0238565. View

2.
van der Steen A, van Rosmalen J, Kroep S, van Hees F, Steyerberg E, de Koning H . Calibrating Parameters for Microsimulation Disease Models: A Review and Comparison of Different Goodness-of-Fit Criteria. Med Decis Making. 2016; 36(5):652-65. DOI: 10.1177/0272989X16636851. View

3.
Devaux M, Lerouge A, Giuffre G, Giesecke S, Baiocco S, Ricci A . How will the main risk factors contribute to the burden of non-communicable diseases under different scenarios by 2050? A modelling study. PLoS One. 2020; 15(4):e0231725. PMC: 7190114. DOI: 10.1371/journal.pone.0231725. View

4.
Hendriksen M, Over E, Navis G, Joles J, Hoorn E, Gansevoort R . Limited salt consumption reduces the incidence of chronic kidney disease: a modeling study. J Public Health (Oxf). 2018; 40(3):e351-e358. DOI: 10.1093/pubmed/fdx178. View

5.
Kooiker R, Boshuizen H . Internal consistency of a synthetic population construction method for chronic disease micro-simulation models. PLoS One. 2018; 13(11):e0205225. PMC: 6237328. DOI: 10.1371/journal.pone.0205225. View