» Articles » PMID: 34814315

Synthetic Data Generation with Probabilistic Bayesian Networks

Overview
Journal Math Biosci Eng
Specialty Biology
Date 2021 Nov 24
PMID 34814315
Citations 5
Authors
Affiliations
Soon will be listed here.
Abstract

Bayesian Network (BN) modeling is a prominent and increasingly popular computational systems biology method. It aims to construct network graphs from the large heterogeneous biological datasets that reflect the underlying biological relationships. Currently, a variety of strategies exist for evaluating BN methodology performance, ranging from utilizing artificial benchmark datasets and models, to specialized biological benchmark datasets, to simulation studies that generate synthetic data from predefined network models. The last is arguably the most comprehensive approach; however, existing implementations often rely on explicit and implicit assumptions that may be unrealistic in a typical biological data analysis scenario, or are poorly equipped for automated arbitrary model generation. In this study, we develop a purely probabilistic simulation framework that addresses the demands of statistically sound simulations studies in an unbiased fashion. Additionally, we expand on our current understanding of the theoretical notions of causality and dependence / conditional independence in BNs and the Markov Blankets within.

Citing Articles

Changes in expression of VGF, SPECC1L, HLA-DRA and RANBP3L act with APOE E4 to alter risk for late onset Alzheimer's disease.

Branciamore S, Gogoshin G, Rodin A, Myers A Sci Rep. 2024; 14(1):14954.

PMID: 38942763 PMC: 11213882. DOI: 10.1038/s41598-024-65010-7.


The Human Brainome: changes in expression of VGF, SPECC1L, HLA-DRA and RANBP3L act with APOE E4 to alter risk for late onset Alzheimer's disease.

Branciamore S, Gogoshin G, Rodin A, Myers A Res Sq. 2024; .

PMID: 38168398 PMC: 10760217. DOI: 10.21203/rs.3.rs-3678057/v1.


Bayesian network models identify co-operative GPCR:G protein interactions that contribute to G protein coupling.

Mukhaleva E, Ma N, van der Velden W, Gogoshin G, Branciamore S, Bhattacharya S bioRxiv. 2023; .

PMID: 37873104 PMC: 10592737. DOI: 10.1101/2023.10.09.561618.


Bayesian network modeling of risk and prodromal markers of Parkinson's disease.

Sood M, Suenkel U, von Thaler A, Zacharias H, Brockmann K, Eschweiler G PLoS One. 2023; 18(2):e0280609.

PMID: 36827273 PMC: 9955606. DOI: 10.1371/journal.pone.0280609.


Causal Datasheet for Datasets: An Evaluation Guide for Real-World Data Analysis and Data Collection Design Using Bayesian Networks.

Butcher B, Huang V, Robinson C, Reffin J, Sgaier S, Charles G Front Artif Intell. 2021; 4:612551.

PMID: 34337389 PMC: 8320747. DOI: 10.3389/frai.2021.612551.

References
1.
Zhang Q, Shi X . A mixture copula Bayesian network model for multimodal genomic data. Cancer Inform. 2017; 16:1176935117702389. PMC: 5397279. DOI: 10.1177/1176935117702389. View

2.
Kaur D, Sobiesk M, Patil S, Liu J, Bhagat P, Gupta A . Application of Bayesian networks to generate synthetic health data. J Am Med Inform Assoc. 2020; 28(4):801-811. PMC: 7973486. DOI: 10.1093/jamia/ocaa303. View

3.
Zeng Z, Jiang X, Neapolitan R . Discovering causal interactions using Bayesian network scoring and information gain. BMC Bioinformatics. 2016; 17(1):221. PMC: 4880828. DOI: 10.1186/s12859-016-1084-8. View

4.
Ramsey J, Glymour M, Sanchez-Romero R, Glymour C . A million variables and more: the Fast Greedy Equivalence Search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. Int J Data Sci Anal. 2017; 3(2):121-129. PMC: 5380925. DOI: 10.1007/s41060-016-0032-z. View

5.
Neapolitan R, Xue D, Jiang X . Modeling the altered expression levels of genes on signaling pathways in tumors as causal bayesian networks. Cancer Inform. 2014; 13:77-84. PMC: 4051800. DOI: 10.4137/CIN.S13578. View