» Articles » PMID: 29415654

The Research on Gene-disease Association Based on Text-mining of PubMed

Overview
Publisher Biomed Central
Specialty Biology
Date 2018 Feb 9
PMID 29415654
Citations 21
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The associations between genes and diseases are of critical significance in aspects of prevention, diagnosis and treatment. Although gene-disease relationships have been investigated extensively, much of the underpinnings of these associations are yet to be elucidated.

Methods: A novel method integrates MeSH database, term weight (TW), and co-occurrence methods to predict gene-disease associations based on the cosine similarity between gene vectors and disease vectors. Vectors are transformed from the texts of documents in the PubMed database according to the appearance and location of the gene or disease terms. The disease related text data has been optimized during the process of constructing vectors.

Results: The overall distribution of cosine similarity value was investigated. By using the gene-disease association data in OMIM database as golden standard, the performance of cosine similarity in predicting gene-disease linkage was evaluated. The effects of applying weight matrix, penalty weights for keywords (PWK), and normalization were also investigated. Finally, we demonstrated that our method outperforms heterogeneous network edge prediction (HNEP) in aspects of precision rate and recall rate.

Conclusions: Our method proposed in this paper is easy to be conducted and the results can be integrated with other models to improve the overall performance of gene-disease association predictions.

Citing Articles

Literature mining discerns latent disease-gene relationships.

Rai P, Jain A, Kumar S, Sharma D, Jha N, Chawla S Bioinformatics. 2024; 40(4).

PMID: 38608194 PMC: 11060865. DOI: 10.1093/bioinformatics/btae185.


Robustness evaluations of pathway activity inference methods on gene expression data.

Hui T, Kasim S, Abdul Aziz I, Fudzee M, Haron N, Sutikno T BMC Bioinformatics. 2024; 25(1):23.

PMID: 38216898 PMC: 10785356. DOI: 10.1186/s12859-024-05632-w.


MantaID: a machine learning-based tool to automate the identification of biological database IDs.

Zeng Z, Hu J, Cao M, Li B, Wang X, Yu F Database (Oxford). 2023; 2023.

PMID: 37159241 PMC: 10168000. DOI: 10.1093/database/baad028.


Cluster-based text mining for extracting drug candidates for the prevention of COVID-19 from the biomedical literature.

Supianto A, Nurdiansyah R, Weng C, Zilvan V, Yuwana R, Arisal A J Taibah Univ Med Sci. 2023; 18(4):787-801.

PMID: 36618881 PMC: 9810500. DOI: 10.1016/j.jtumed.2022.12.015.


Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts.

Nicholson D, Himmelstein D, Greene C BioData Min. 2022; 15(1):26.

PMID: 36258252 PMC: 9578183. DOI: 10.1186/s13040-022-00311-z.


References
1.
Trindade D, Orsine L, Barbosa-Silva A, Donnard E, Ortega J . A guide for building biological pathways along with two case studies: hair and breast development. Methods. 2014; 74:16-35. DOI: 10.1016/j.ymeth.2014.10.006. View

2.
Tsai R, Lai P . Dynamic programming re-ranking for PPI interactor and pair extraction in full-text articles. BMC Bioinformatics. 2011; 12:60. PMC: 3053256. DOI: 10.1186/1471-2105-12-60. View

3.
Natarajan N, Dhillon I . Inductive matrix completion for predicting gene-disease associations. Bioinformatics. 2014; 30(12):i60-68. PMC: 4058925. DOI: 10.1093/bioinformatics/btu269. View

4.
Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R . Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol. 2010; 6(1):e1000641. PMC: 2797085. DOI: 10.1371/journal.pcbi.1000641. View

5.
Singh-Blom U, Natarajan N, Tewari A, Woods J, Dhillon I, Marcotte E . Prediction and validation of gene-disease associations using methods inspired by social network analyses. PLoS One. 2013; 8(5):e58977. PMC: 3641094. DOI: 10.1371/journal.pone.0058977. View