» Articles » PMID: 33367620

Application of Bayesian Networks to Generate Synthetic Health Data

Overview
Date 2020 Dec 28
PMID 33367620
Citations 15
Authors
Affiliations
Soon will be listed here.
Abstract

Objective: This study seeks to develop a fully automated method of generating synthetic data from a real dataset that could be employed by medical organizations to distribute health data to researchers, reducing the need for access to real data. We hypothesize the application of Bayesian networks will improve upon the predominant existing method, medBGAN, in handling the complexity and dimensionality of healthcare data.

Materials And Methods: We employed Bayesian networks to learn probabilistic graphical structures and simulated synthetic patient records from the learned structure. We used the University of California Irvine (UCI) heart disease and diabetes datasets as well as the MIMIC-III diagnoses database. We evaluated our method through statistical tests, machine learning tasks, preservation of rare events, disclosure risk, and the ability of a machine learning classifier to discriminate between the real and synthetic data.

Results: Our Bayesian network model outperformed or equaled medBGAN in all key metrics. Notable improvement was achieved in capturing rare variables and preserving association rules.

Discussion: Bayesian networks generated data sufficiently similar to the original data with minimal risk of disclosure, while offering additional transparency, computational efficiency, and capacity to handle more data types in comparison to existing methods. We hope this method will allow healthcare organizations to efficiently disseminate synthetic health data to researchers, enabling them to generate hypotheses and develop analytical tools.

Conclusion: We conclude the application of Bayesian networks is a promising option for generating realistic synthetic health data that preserves the features of the original data without compromising data privacy.

Citing Articles

Large language models and synthetic health data: progress and prospects.

Smolyak D, Bjarnadottir M, Crowley K, Agarwal R JAMIA Open. 2024; 7(4):ooae114.

PMID: 39464796 PMC: 11512648. DOI: 10.1093/jamiaopen/ooae114.


On the evaluation of synthetic longitudinal electronic health records.

Achterberg J, Haas M, Spruit M BMC Med Res Methodol. 2024; 24(1):181.

PMID: 39143466 PMC: 11323671. DOI: 10.1186/s12874-024-02304-4.


Comparison of Synthetic Data Generation Techniques for Control Group Survival Data in Oncology Clinical Trials: Simulation Study.

Akiya I, Ishihara T, Yamamoto K JMIR Med Inform. 2024; 12:e55118.

PMID: 38889082 PMC: 11196245. DOI: 10.2196/55118.


Synthetic Data Improve Survival Status Prediction Models in Early-Onset Colorectal Cancer.

Kim H, Jang W, Sim W, Kim H, Choi J, Baek E JCO Clin Cancer Inform. 2024; 8():e2300201.

PMID: 38271642 PMC: 10830088. DOI: 10.1200/CCI.23.00201.


New Approach for Generating Synthetic Medical Data to Predict Type 2 Diabetes.

Tagmatova Z, Abdusalomov A, Nasimov R, Nasimova N, Dogru A, Cho Y Bioengineering (Basel). 2023; 10(9).

PMID: 37760133 PMC: 10525473. DOI: 10.3390/bioengineering10091031.


References
1.
Liu Z, Malone B, Yuan C . Empirical evaluation of scoring functions for Bayesian network model selection. BMC Bioinformatics. 2012; 13 Suppl 15:S14. PMC: 3439716. DOI: 10.1186/1471-2105-13-S15-S14. View

2.
Xiao C, Choi E, Sun J . Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. J Am Med Inform Assoc. 2018; 25(10):1419-1428. PMC: 6188527. DOI: 10.1093/jamia/ocy068. View

3.
Davenport T, Kalakota R . The potential for artificial intelligence in healthcare. Future Healthc J. 2019; 6(2):94-98. PMC: 6616181. DOI: 10.7861/futurehosp.6-2-94. View

4.
Rothstein M . Is deidentification sufficient to protect health privacy in research?. Am J Bioeth. 2010; 10(9):3-11. PMC: 3032399. DOI: 10.1080/15265161.2010.494215. View

5.
Walonoski J, Kramer M, Nichols J, Quina A, Moesel C, Hall D . Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J Am Med Inform Assoc. 2017; 25(3):230-238. PMC: 7651916. DOI: 10.1093/jamia/ocx079. View