» Articles » PMID: 39463411

Synthetic Data for Privacy-preserving Clinical Risk Prediction

Overview
Journal Sci Rep
Specialty Science
Date 2024 Oct 28
PMID 39463411
Authors
Affiliations
Soon will be listed here.
Abstract

Synthetic data promise privacy-preserving data sharing for healthcare research and development. Compared with other privacy-enhancing approaches-such as federated learning-analyses performed on synthetic data can be applied downstream without modification, such that synthetic data can act in place of real data for a wide range of use cases. However, the role that synthetic data might play in all aspects of clinical model development remains unknown. In this work, we used state-of-the-art generators explicitly designed for privacy preservation to create a synthetic version of ever-smokers in the UK Biobank before building prognostic models for lung cancer under several data release assumptions. We demonstrate that synthetic data can be effectively used throughout the medical prognostic modeling pipeline even without eventual access to the real data. Furthermore, we show the implications of different data release approaches on how synthetic biobank data could be deployed within the healthcare system.

Citing Articles

Embracing Generative Artificial Intelligence in Clinical Research and Beyond: Opportunities, Challenges, and Solutions.

Foote H, Hong C, Anwar M, Borentain M, Bugin K, Dreyer N JACC Adv. 2025; 4(3):101593.

PMID: 39923329 PMC: 11850149. DOI: 10.1016/j.jacadv.2025.101593.


CHeart: A Conditional Spatio-Temporal Generative Model for Cardiac Anatomy.

Qiao M, Wang S, Qiu H, de Marvao A, ORegan D, Rueckert D IEEE Trans Med Imaging. 2023; 43(3):1259-1269.

PMID: 37948142 PMC: 7615911. DOI: 10.1109/TMI.2023.3331982.

References
1.
Yoon J, Drumright L, van der Schaar M . Anonymization Through Data Synthesis Using Generative Adversarial Networks (ADS-GAN). IEEE J Biomed Health Inform. 2020; 24(8):2378-2388. DOI: 10.1109/JBHI.2020.2980262. View

2.
Tucker A, Wang Z, Rotalinti Y, Myles P . Generating high-fidelity synthetic patient data for assessing machine learning healthcare software. NPJ Digit Med. 2020; 3(1):147. PMC: 7653933. DOI: 10.1038/s41746-020-00353-9. View

3.
Uno H, Cai T, Pencina M, DAgostino R, Wei L . On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. 2011; 30(10):1105-17. PMC: 3079915. DOI: 10.1002/sim.4154. View

4.
Mascalzoni D, Bentzen H, Budin-Ljosne I, Bygrave L, Bell J, Dove E . Are Requirements to Deposit Data in Research Repositories Compatible With the European Union's General Data Protection Regulation?. Ann Intern Med. 2019; 170(5):332-334. DOI: 10.7326/M18-2854. View

5.
El Emam K, Jonker E, Arbuckle L, Malin B . A systematic review of re-identification attacks on health data. PLoS One. 2011; 6(12):e28071. PMC: 3229505. DOI: 10.1371/journal.pone.0028071. View