» Articles » PMID: 39753917

Generating Unseen Diseases Patient Data Using Ontology Enhanced Generative Adversarial Networks

Overview
Journal NPJ Digit Med
Date 2025 Jan 3
PMID 39753917
Authors
Affiliations
Soon will be listed here.
Abstract

Generating realistic synthetic health data (e.g., electronic health records), holds promise for fundamental research, AI model development, and enhancing data privacy safeguards. Generative Adversarial Networks (GANs) have been employed for this purpose, but their performance is largely constrained by their reliance on training data, rendering them inadequate for rare or previously unseen diseases. This study proposes Onto-CGAN, a novel generative framework that combines knowledge from disease ontologies with GANs to generate unseen diseases that are not present in the training data. The quality of the generated data is evaluated using variable distributions, correlation coefficients, and machine learning model performance. Our findings demonstrate that Onto-CGAN generates unseen diseases with statistical characteristics comparable to the real data, and significantly improves the training of machine learning models. This innovative approach addresses the scarcity of data for rare diseases, offering valuable applications in data augmentation, hypothesis generation, and preclinical validation of clinical models.

References
1.
Bumbea H, Vladareanu A, Dumitru I, Popov V, Ciufu C, Nicolescu A . Platelet Defects in Acute Myeloid Leukemia-Potential for Hemorrhagic Events. J Clin Med. 2022; 11(1). PMC: 8745388. DOI: 10.3390/jcm11010118. View

2.
Tucker A, Wang Z, Rotalinti Y, Myles P . Generating high-fidelity synthetic patient data for assessing machine learning healthcare software. NPJ Digit Med. 2020; 3(1):147. PMC: 7653933. DOI: 10.1038/s41746-020-00353-9. View

3.
Yoon J, Mizrahi M, Ghalaty N, Jarvinen T, Ravi A, Brune P . EHR-Safe: generating high-fidelity and privacy-preserving synthetic electronic health records. NPJ Digit Med. 2023; 6(1):141. PMC: 10421926. DOI: 10.1038/s41746-023-00888-7. View

4.
Gargano M, Matentzoglu N, Coleman B, Addo-Lartey E, Anagnostopoulos A, Anderton J . The Human Phenotype Ontology in 2024: phenotypes around the world. Nucleic Acids Res. 2023; 52(D1):D1333-D1346. PMC: 10767975. DOI: 10.1093/nar/gkad1005. View

5.
Zhang Y, Wu Q, Yuan B, Huang Y, Jiang L, Liu F . Influence on therapeutic outcome of platelet count at diagnosis in patients with de novo non-APL acute myeloid leukemia. BMC Cancer. 2023; 23(1):1030. PMC: 10598966. DOI: 10.1186/s12885-023-11543-5. View