» Articles » PMID: 38605639

KDGene: Knowledge Graph Completion for Disease Gene Prediction Using Interactional Tensor Decomposition

Overview
Journal Brief Bioinform
Specialty Biology
Date 2024 Apr 12
PMID 38605639
Authors
Affiliations
Soon will be listed here.
Abstract

The accurate identification of disease-associated genes is crucial for understanding the molecular mechanisms underlying various diseases. Most current methods focus on constructing biological networks and utilizing machine learning, particularly deep learning, to identify disease genes. However, these methods overlook complex relations among entities in biological knowledge graphs. Such information has been successfully applied in other areas of life science research, demonstrating their effectiveness. Knowledge graph embedding methods can learn the semantic information of different relations within the knowledge graphs. Nonetheless, the performance of existing representation learning techniques, when applied to domain-specific biological data, remains suboptimal. To solve these problems, we construct a biological knowledge graph centered on diseases and genes, and develop an end-to-end knowledge graph completion framework for disease gene prediction using interactional tensor decomposition named KDGene. KDGene incorporates an interaction module that bridges entity and relation embeddings within tensor decomposition, aiming to improve the representation of semantically similar concepts in specific domains and enhance the ability to accurately predict disease genes. Experimental results show that KDGene significantly outperforms state-of-the-art algorithms, whether existing disease gene prediction methods or knowledge graph embedding methods for general domains. Moreover, the comprehensive biological analysis of the predicted results further validates KDGene's capability to accurately identify new candidate genes. This work proposes a scalable knowledge graph completion framework to identify disease candidate genes, from which the results are promising to provide valuable references for further wet experiments. Data and source codes are available at https://github.com/2020MEAI/KDGene.

Citing Articles

LIMO-GCN: a linear model-integrated graph convolutional network for predicting Alzheimer disease genes.

Lin C, Li H, Wang J Brief Bioinform. 2024; 26(1).

PMID: 39592152 PMC: 11596108. DOI: 10.1093/bib/bbae611.

References
1.
Szklarczyk D, Santos A, von Mering C, Jensen L, Bork P, Kuhn M . STITCH 5: augmenting protein-chemical interaction networks with tissue and affinity data. Nucleic Acids Res. 2015; 44(D1):D380-4. PMC: 4702904. DOI: 10.1093/nar/gkv1277. View

2.
Ma Y . DeepMNE: Deep Multi-Network Embedding for lncRNA-Disease Association Prediction. IEEE J Biomed Health Inform. 2022; 26(7):3539-3549. DOI: 10.1109/JBHI.2022.3152619. View

3.
Wu X, Jiang R, Zhang M, Li S . Network-based global inference of human disease genes. Mol Syst Biol. 2008; 4:189. PMC: 2424293. DOI: 10.1038/msb.2008.27. View

4.
Calvo B, Lopez-Bigas N, Furney S, Larranaga P, Lozano J . A partially supervised classification approach to dominant and recessive human disease gene prediction. Comput Methods Programs Biomed. 2007; 85(3):229-37. DOI: 10.1016/j.cmpb.2006.12.003. View

5.
von Mering C, Jensen L, Snel B, Hooper S, Krupp M, Foglierini M . STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res. 2004; 33(Database issue):D433-7. PMC: 539959. DOI: 10.1093/nar/gki005. View