» Articles » PMID: 15767279

Integration of Text- and Data-mining Using Ontologies Successfully Selects Disease Gene Candidates

Overview
Specialty Biochemistry
Date 2005 Mar 16
PMID 15767279
Citations 63
Authors
Affiliations
Soon will be listed here.
Abstract

Genome-wide techniques such as microarray analysis, Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS), linkage analysis and association studies are used extensively in the search for genes that cause diseases, and often identify many hundreds of candidate disease genes. Selection of the most probable of these candidate disease genes for further empirical analysis is a significant challenge. Additionally, identifying the genes that cause complex diseases is problematic due to low penetrance of multiple contributing genes. Here, we describe a novel bioinformatic approach that selects candidate disease genes according to their expression profiles. We use the eVOC anatomical ontology to integrate text-mining of biomedical literature and data-mining of available human gene expression data. To demonstrate that our method is successful and widely applicable, we apply it to a database of 417 candidate genes containing 17 known disease genes. We successfully select the known disease gene for 15 out of 17 diseases and reduce the candidate gene set to 63.3% (+/-18.8%) of its original size. This approach facilitates direct association between genomic data describing gene expression and information from biomedical texts describing disease phenotype, and successfully prioritizes candidate genes according to their expression in disease-affected tissues.

Citing Articles

Deafness gene screening based on a multilevel cascaded BPNN model.

Liu X, Teng L, Zuo W, Zhong S, Xu Y, Sun J BMC Bioinformatics. 2023; 24(1):56.

PMID: 36803022 PMC: 9942297. DOI: 10.1186/s12859-023-05182-7.


DES-ROD: Exploring Literature to Develop New Links between RNA Oxidation and Human Diseases.

Essack M, Salhi A, Van Neste C, Raies A, Tifratene F, Uludag M Oxid Med Cell Longev. 2020; 2020:5904315.

PMID: 32308806 PMC: 7142358. DOI: 10.1155/2020/5904315.


Literature-Based Enrichment Insights into Redox Control of Vascular Biology.

Essack M, Salhi A, Stanimirovic J, Tifratene F, Raies A, Hungler A Oxid Med Cell Longev. 2019; 2019:1769437.

PMID: 31223421 PMC: 6542245. DOI: 10.1155/2019/1769437.


An integrative network-based approach to identify novel disease genes and pathways: a case study in the context of inflammatory bowel disease.

Eguchi R, Karim M, Hu P, Sato T, Ono N, Kanaya S BMC Bioinformatics. 2018; 19(1):264.

PMID: 30005591 PMC: 6043997. DOI: 10.1186/s12859-018-2251-x.


Mimvec: a deep learning approach for analyzing the human phenome.

Gan M, Li W, Zeng W, Wang X, Jiang R BMC Syst Biol. 2017; 11(Suppl 4):76.

PMID: 28950906 PMC: 5615244. DOI: 10.1186/s12918-017-0451-z.


References
1.
Risch N . Searching for genetic determinants in the new millennium. Nature. 2000; 405(6788):847-56. DOI: 10.1038/35015718. View

2.
Andrade M, Bork P . Automated extraction of information in molecular biology. FEBS Lett. 2000; 476(1-2):12-7. DOI: 10.1016/s0014-5793(00)01661-6. View

3.
Roberts R, Varmus H, Ashburner M, Brown P, Eisen M, Khosla C . Information access. Building a "GenBank" of the published literature. Science. 2001; 291(5512):2318-9. DOI: 10.1126/science.291.5512.2318b. View

4.
Grivell L . Mining the bibliome: searching for a needle in a haystack? New computing tools are needed to effectively scan the growing amount of scientific literature for useful information. EMBO Rep. 2002; 3(3):200-3. PMC: 1084023. DOI: 10.1093/embo-reports/kvf059. View

5.
Tabor H, Risch N, Myers R . Candidate-gene approaches for studying complex genetic traits: practical considerations. Nat Rev Genet. 2002; 3(5):391-7. DOI: 10.1038/nrg796. View