» Articles » PMID: 25887686

Wide-coverage Relation Extraction from MEDLINE Using Deep Syntax

Overview
Publisher Biomed Central
Specialty Biology
Date 2015 Apr 19
PMID 25887686
Citations 7
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Relation extraction is a fundamental technology in biomedical text mining. Most of the previous studies on relation extraction from biomedical literature have focused on specific or predefined types of relations, which inherently limits the types of the extracted relations. With the aim of fully leveraging the knowledge described in the literature, we address much broader types of semantic relations using a single extraction framework.

Results: Our system, which we name PASMED, extracts diverse types of binary relations from biomedical literature using deep syntactic patterns. Our experimental results demonstrate that it achieves a level of recall considerably higher than the state of the art, while maintaining reasonable precision. We have then applied PASMED to the whole MEDLINE corpus and extracted more than 137 million semantic relations. The extracted relations provide a quantitative understanding of what kinds of semantic relations are actually described in MEDLINE and can be ultimately extracted by (possibly type-specific) relation extraction systems.

Conclusion: PASMED extracts a large number of relations that have previously been missed by existing text mining systems. The entire collection of the relations extracted from MEDLINE is publicly available in machine-readable form, so that it can serve as a potential knowledge base for high-level text-mining applications.

Citing Articles

Unsupervised literature mining approaches for extracting relationships pertaining to habitats and reproductive conditions of plant species.

Gabud R, Lapitan P, Mariano V, Mendoza E, Pampolina N, Clarino M Front Artif Intell. 2024; 7:1371411.

PMID: 38845683 PMC: 11153722. DOI: 10.3389/frai.2024.1371411.


Semantics-enabled biomedical literature analytics.

Kilicoglu H, Ensan F, McInnes B, Wang L J Biomed Inform. 2024; 150:104588.

PMID: 38244957 PMC: 11771130. DOI: 10.1016/j.jbi.2024.104588.


Learning Inter-Sentence, Disorder-Centric, Biomedical Relationships from Medical Literature.

van der Vegt A, Zuccon G, Koopman B AMIA Annu Symp Proc. 2020; 2019:1216-1225.

PMID: 32308919 PMC: 7153107.


Using a Large Margin Context-Aware Convolutional Neural Network to Automatically Extract Disease-Disease Association from Literature: Comparative Analytic Study.

Lai P, Lu W, Kuo T, Chung C, Han J, Tsai R JMIR Med Inform. 2019; 7(4):e14502.

PMID: 31769759 PMC: 6913619. DOI: 10.2196/14502.


COPIOUS: A gold standard corpus of named entities towards extracting species occurrence from biodiversity literature.

Nguyen N, Gabud R, Ananiadou S Biodivers Data J. 2019; (7):e29626.

PMID: 30700967 PMC: 6351503. DOI: 10.3897/BDJ.7.e29626.


References
1.
Rindflesch T, Fiszman M . The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform. 2004; 36(6):462-77. DOI: 10.1016/j.jbi.2003.11.003. View

2.
Bodenreider O . The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2003; 32(Database issue):D267-70. PMC: 308795. DOI: 10.1093/nar/gkh061. View

3.
Chun H, Tsuruoka Y, Kim J, Shiba R, Nagata N, Hishiki T . Extraction of gene-disease relations from Medline using domain dictionaries and machine learning. Pac Symp Biocomput. 2006; :4-15. View

4.
Aronson A, Lang F . An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010; 17(3):229-36. PMC: 2995713. DOI: 10.1136/jamia.2009.002733. View

5.
Van Landeghem S, Bjorne J, Wei C, Hakala K, Pyysalo S, Ananiadou S . Large-scale event extraction from literature with multi-level gene normalization. PLoS One. 2013; 8(4):e55814. PMC: 3629104. DOI: 10.1371/journal.pone.0055814. View