» Articles » PMID: 29949967

Gene Prioritization Using Bayesian Matrix Factorization with Genomic and Phenotypic Side Information

Overview
Journal Bioinformatics
Specialty Biology
Date 2018 Jun 29
PMID 29949967
Citations 13
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Most gene prioritization methods model each disease or phenotype individually, but this fails to capture patterns common to several diseases or phenotypes. To overcome this limitation, we formulate the gene prioritization task as the factorization of a sparsely filled gene-phenotype matrix, where the objective is to predict the unknown matrix entries. To deliver more accurate gene-phenotype matrix completion, we extend classical Bayesian matrix factorization to work with multiple side information sources. The availability of side information allows us to make non-trivial predictions for genes for which no previous disease association is known.

Results: Our gene prioritization method can innovatively not only integrate data sources describing genes, but also data sources describing Human Phenotype Ontology terms. Experimental results on our benchmarks show that our proposed model can effectively improve accuracy over the well-established gene prioritization method, Endeavour. In particular, our proposed method offers promising results on diseases of the nervous system; diseases of the eye and adnexa; endocrine, nutritional and metabolic diseases; and congenital malformations, deformations and chromosomal abnormalities, when compared to Endeavour.

Availability And Implementation: The Bayesian data fusion method is implemented as a Python/C++ package: https://github.com/jaak-s/macau. It is also available as a Julia package: https://github.com/jaak-s/BayesianDataFusion.jl. All data and benchmarks generated or analyzed during this study can be downloaded at https://owncloud.esat.kuleuven.be/index.php/s/UGb89WfkZwMYoTn.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation.

Yu S, Wang Z, Nan J, Li A, Yang X, Tang X JMIR Form Res. 2023; 7:e50998.

PMID: 37966892 PMC: 10687686. DOI: 10.2196/50998.


DeepGenePrior: A deep learning model for prioritizing genes affected by copy number variants.

Rahaie Z, Rabiee H, Alinejad-Rokny H PLoS Comput Biol. 2023; 19(7):e1011249.

PMID: 37486921 PMC: 10399873. DOI: 10.1371/journal.pcbi.1011249.


A publication-wide association study (PWAS), historical language models to prioritise novel therapeutic drug targets.

Narganes-Carlon D, Crowther D, Pearson E Sci Rep. 2023; 13(1):8366.

PMID: 37225853 PMC: 10209167. DOI: 10.1038/s41598-023-35597-4.


Predicting disease genes based on multi-head attention fusion.

Zhang L, Lu D, Bi X, Zhao K, Yu G, Quan N BMC Bioinformatics. 2023; 24(1):162.

PMID: 37085750 PMC: 10122338. DOI: 10.1186/s12859-023-05285-1.


HetIG-PreDiG: A Heterogeneous Integrated Graph Model for Predicting Human Disease Genes based on gene expression.

Jagodnik K, Shvili Y, Bartal A PLoS One. 2023; 18(2):e0280839.

PMID: 36791052 PMC: 9931161. DOI: 10.1371/journal.pone.0280839.


References
1.
Pinero J, Queralt-Rosinach N, Bravo A, Deu-Pons J, Bauer-Mehren A, Baron M . DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database (Oxford). 2015; 2015:bav028. PMC: 4397996. DOI: 10.1093/database/bav028. View

2.
Mitchell A, Chang H, Daugherty L, Fraser M, Hunter S, Lopez R . The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res. 2014; 43(Database issue):D213-21. PMC: 4383996. DOI: 10.1093/nar/gku1243. View

3.
Britto R, Sallou O, Collin O, Michaux G, Primig M, Chalmel F . GPSy: a cross-species gene prioritization system for conserved biological processes--application in male gamete development. Nucleic Acids Res. 2012; 40(Web Server issue):W458-65. PMC: 3394256. DOI: 10.1093/nar/gks380. View

4.
ElShal S, Tranchevent L, Sifrim A, ArdeshirDavani A, Davis J, Moreau Y . Beegle: from literature mining to disease-gene discovery. Nucleic Acids Res. 2015; 44(2):e18. PMC: 4737179. DOI: 10.1093/nar/gkv905. View

5.
Tranchevent L, Barriot R, Yu S, Van Vooren S, Van Loo P, Coessens B . ENDEAVOUR update: a web resource for gene prioritization in multiple species. Nucleic Acids Res. 2008; 36(Web Server issue):W377-84. PMC: 2447805. DOI: 10.1093/nar/gkn325. View