Gene Prioritization by Compressive Data Fusion and Chaining

Overview

Journal PLoS Comput Biol

Specialty Biology

Date 2015 Oct 15

PMID 26465776

Citations 12

Authors

Marinka Zitnik

Edward A Nam

Christopher Dinh

Adam Kuspa

Gad Shaulsky

Blaz Zupan

Affiliations

Soon will be listed here.

Abstract

Data integration procedures combine heterogeneous data sets into predictive models, but they are limited to data explicitly related to the target object type, such as genes. Collage is a new data fusion approach to gene prioritization. It considers data sets of various association levels with the prediction task, utilizes collective matrix factorization to compress the data, and chaining to relate different object types contained in a data compendium. Collage prioritizes genes based on their similarity to several seed genes. We tested Collage by prioritizing bacterial response genes in Dictyostelium as a novel model system for prokaryote-eukaryote interactions. Using 4 seed genes and 14 data sets, only one of which was directly related to the bacterial response, Collage proposed 8 candidate genes that were readily validated as necessary for the response of Dictyostelium to Gram-negative bacteria. These findings establish Collage as a method for inferring biological knowledge from the integration of heterogeneous and coarsely related data sets.

Citing Articles

Improving drug repositioning accuracy using non-negative matrix tri-factorization.

Li Q, Wang Y, Wang J, Zhao C Sci Rep. 2025; 15(1):7840.

PMID: 40050702 PMC: 11885831. DOI: 10.1038/s41598-025-91757-8.

KGRDR: a deep learning model based on knowledge graph and graph regularized integration for drug repositioning.

Luo H, Yang H, Zhang G, Wang J, Luo J, Yan C Front Pharmacol. 2025; 16:1525029.

PMID: 40008124 PMC: 11850324. DOI: 10.3389/fphar.2025.1525029.

PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications.

Korlepara D, C S V, Srivastava R, Pal P, Raza S, Kumar V Sci Data. 2024; 11(1):180.

PMID: 38336857 PMC: 10858175. DOI: 10.1038/s41597-023-02872-y.

Graph representation learning in biomedicine and healthcare.

Li M, Huang K, Zitnik M Nat Biomed Eng. 2022; 6(12):1353-1369.

PMID: 36316368 PMC: 10699434. DOI: 10.1038/s41551-022-00942-x.

Disease gene prediction with privileged information and heteroscedastic dropout.

Shu J, Li Y, Wang S, Xi B, Ma J Bioinformatics. 2021; 37(Suppl_1):i410-i417.

PMID: 34252957 PMC: 8275341. DOI: 10.1093/bioinformatics/btab310.

References

Kanehisa M, Goto S, Sato Y, Kawashima M, Furumichi M, Tanabe M . Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 2013; 42(Database issue):D199-205. PMC: 3965122. DOI: 10.1093/nar/gkt1076. View

Zitnik M, Zupan B . Matrix factorization-based data fusion for gene function prediction in baker's yeast and slime mold. Pac Symp Biocomput. 2013; :400-11. PMC: 3902649. View

Wang B, Mezlini A, Demir F, Fiume M, Tu Z, Brudno M . Similarity network fusion for aggregating data types on a genomic scale. Nat Methods. 2014; 11(3):333-7. DOI: 10.1038/nmeth.2810. View

Hykollari A, Dragosits M, Rendic D, Wilson I, Paschinger K . N-glycomic profiling of a glucosidase II mutant of Dictyostelium discoideum by ''off-line'' liquid chromatography and mass spectrometry. Electrophoresis. 2014; 35(15):2116-29. PMC: 4072505. DOI: 10.1002/elps.201300612. View

Zitnik M, Zupan B . Data Fusion by Matrix Factorization. IEEE Trans Pattern Anal Mach Intell. 2015; 37(1):41-53. DOI: 10.1109/TPAMI.2014.2343973. View

Sun J, Jia P, Fanous A, Webb B, van den Oord E, Chen X . A multi-dimensional evidence-based candidate gene prioritization approach for complex diseases-schizophrenia as a case. Bioinformatics. 2009; 25(19):2595-6602. PMC: 2752609. DOI: 10.1093/bioinformatics/btp428. View

Sharma A, Chavali S, Tabassum R, Tandon N, Bharadwaj D . Gene prioritization in Type 2 Diabetes using domain interactions and network analysis. BMC Genomics. 2010; 11:84. PMC: 2824729. DOI: 10.1186/1471-2164-11-84. View

Yu S, Tranchevent L, De Moor B, Moreau Y . Gene prioritization and clustering by multi-view text mining. BMC Bioinformatics. 2010; 11:28. PMC: 3098068. DOI: 10.1186/1471-2105-11-28. View

Parikh A, Miranda E, Katoh-Kurasawa M, Fuller D, Rot G, Zagar L . Conserved developmental transcriptomes in evolutionarily divergent species. Genome Biol. 2010; 11(3):R35. PMC: 2864575. DOI: 10.1186/gb-2010-11-3-r35. View

10.

Cabral M, Anjard C, Malhotra V, Loomis W, Kuspa A . Unconventional secretion of AcbA in Dictyostelium discoideum through a vesicular intermediate. Eukaryot Cell. 2010; 9(7):1009-17. PMC: 2901666. DOI: 10.1128/EC.00337-09. View

11.

Schlicker A, Lengauer T, Albrecht M . Improving disease gene prioritization using the semantic similarity of Gene Ontology terms. Bioinformatics. 2010; 26(18):i561-7. PMC: 2935448. DOI: 10.1093/bioinformatics/btq384. View

12.

Steinert M . Pathogen-host interactions in Dictyostelium, Legionella, Mycobacterium and other pathogens. Semin Cell Dev Biol. 2010; 22(1):70-6. DOI: 10.1016/j.semcdb.2010.11.003. View

13.

Lima W, Lelong E, Cosson P . What can Dictyostelium bring to the study of Pseudomonas infections?. Semin Cell Dev Biol. 2010; 22(1):77-81. DOI: 10.1016/j.semcdb.2010.11.006. View

14.

Bozzaro S, Eichinger L . The professional phagocyte Dictyostelium discoideum as a model host for bacterial pathogens. Curr Drug Targets. 2011; 12(7):942-54. PMC: 3267156. DOI: 10.2174/138945011795677782. View

15.

Fontaine J, Priller F, Barbosa-Silva A, Andrade-Navarro M . Génie: literature-based gene prioritization at multi genomic scale. Nucleic Acids Res. 2011; 39(Web Server issue):W455-61. PMC: 3125729. DOI: 10.1093/nar/gkr246. View

16.

Mostafavi S, Morris Q . Combining many interaction networks to predict gene function and analyze gene lists. Proteomics. 2012; 12(10):1687-96. DOI: 10.1002/pmic.201100607. View

17.

Moreau Y, Tranchevent L . Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nat Rev Genet. 2012; 13(8):523-36. DOI: 10.1038/nrg3253. View

18.

Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A . STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 2012; 41(Database issue):D808-15. PMC: 3531103. DOI: 10.1093/nar/gks1094. View

19.

Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J . Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000; 25(1):25-9. PMC: 3037419. DOI: 10.1038/75556. View

20.

Brock D, Hatton R, Giurgiutiu D, Scott B, Ammann R, Gomer R . The different components of a multisubunit cell number-counting factor have both unique and overlapping functions. Development. 2002; 129(15):3657-68. DOI: 10.1242/dev.129.15.3657. View