Gene Prioritization by Compressive Data Fusion and Chaining
Overview
Affiliations
Data integration procedures combine heterogeneous data sets into predictive models, but they are limited to data explicitly related to the target object type, such as genes. Collage is a new data fusion approach to gene prioritization. It considers data sets of various association levels with the prediction task, utilizes collective matrix factorization to compress the data, and chaining to relate different object types contained in a data compendium. Collage prioritizes genes based on their similarity to several seed genes. We tested Collage by prioritizing bacterial response genes in Dictyostelium as a novel model system for prokaryote-eukaryote interactions. Using 4 seed genes and 14 data sets, only one of which was directly related to the bacterial response, Collage proposed 8 candidate genes that were readily validated as necessary for the response of Dictyostelium to Gram-negative bacteria. These findings establish Collage as a method for inferring biological knowledge from the integration of heterogeneous and coarsely related data sets.
Improving drug repositioning accuracy using non-negative matrix tri-factorization.
Li Q, Wang Y, Wang J, Zhao C Sci Rep. 2025; 15(1):7840.
PMID: 40050702 PMC: 11885831. DOI: 10.1038/s41598-025-91757-8.
Luo H, Yang H, Zhang G, Wang J, Luo J, Yan C Front Pharmacol. 2025; 16:1525029.
PMID: 40008124 PMC: 11850324. DOI: 10.3389/fphar.2025.1525029.
Korlepara D, C S V, Srivastava R, Pal P, Raza S, Kumar V Sci Data. 2024; 11(1):180.
PMID: 38336857 PMC: 10858175. DOI: 10.1038/s41597-023-02872-y.
Graph representation learning in biomedicine and healthcare.
Li M, Huang K, Zitnik M Nat Biomed Eng. 2022; 6(12):1353-1369.
PMID: 36316368 PMC: 10699434. DOI: 10.1038/s41551-022-00942-x.
Disease gene prediction with privileged information and heteroscedastic dropout.
Shu J, Li Y, Wang S, Xi B, Ma J Bioinformatics. 2021; 37(Suppl_1):i410-i417.
PMID: 34252957 PMC: 8275341. DOI: 10.1093/bioinformatics/btab310.