» Articles » PMID: 27080229

Sortal Anaphora Resolution to Enhance Relation Extraction from Biomedical Literature

Overview
Publisher Biomed Central
Specialty Biology
Date 2016 Apr 16
PMID 27080229
Citations 5
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Entity coreference is common in biomedical literature and it can affect text understanding systems that rely on accurate identification of named entities, such as relation extraction and automatic summarization. Coreference resolution is a foundational yet challenging natural language processing task which, if performed successfully, is likely to enhance such systems significantly. In this paper, we propose a semantically oriented, rule-based method to resolve sortal anaphora, a specific type of coreference that forms the majority of coreference instances in biomedical literature. The method addresses all entity types and relies on linguistic components of SemRep, a broad-coverage biomedical relation extraction system. It has been incorporated into SemRep, extending its core semantic interpretation capability from sentence level to discourse level.

Results: We evaluated our sortal anaphora resolution method in several ways. The first evaluation specifically focused on sortal anaphora relations. Our methodology achieved a F1 score of 59.6 on the test portion of a manually annotated corpus of 320 Medline abstracts, a 4-fold improvement over the baseline method. Investigating the impact of sortal anaphora resolution on relation extraction, we found that the overall effect was positive, with 50 % of the changes involving uninformative relations being replaced by more specific and informative ones, while 35 % of the changes had no effect, and only 15 % were negative. We estimate that anaphora resolution results in changes in about 1.5 % of approximately 82 million semantic relations extracted from the entire PubMed.

Conclusions: Our results demonstrate that a heavily semantic approach to sortal anaphora resolution is largely effective for biomedical literature. Our evaluation and error analysis highlight some areas for further improvements, such as coordination processing and intra-sentential antecedent selection.

Citing Articles

Enhancing the coverage of SemRep using a relation classification approach.

Ming S, Zhang R, Kilicoglu H J Biomed Inform. 2024; 155:104658.

PMID: 38782169 PMC: 11770837. DOI: 10.1016/j.jbi.2024.104658.


Broad-coverage biomedical relation extraction with SemRep.

Kilicoglu H, Rosemblat G, Fiszman M, Shin D BMC Bioinformatics. 2020; 21(1):188.

PMID: 32410573 PMC: 7222583. DOI: 10.1186/s12859-020-3517-7.


An investigation of single-domain and multidomain medication and adverse drug event relation extraction from electronic health record notes using advanced deep learning models.

Li F, Yu H J Am Med Inform Assoc. 2019; 26(7):646-654.

PMID: 30938761 PMC: 6562161. DOI: 10.1093/jamia/ocz018.


Extraction of Information Related to Adverse Drug Events from Electronic Health Record Notes: Design of an End-to-End Model Based on Deep Learning.

Li F, Liu W, Yu H JMIR Med Inform. 2018; 6(4):e12159.

PMID: 30478023 PMC: 6288593. DOI: 10.2196/12159.


Making Sense of Big Textual Data for Health Care: Findings from the Section on Clinical Natural Language Processing.

Neveol A, Zweigenbaum P Yearb Med Inform. 2017; 26(1):228-234.

PMID: 29063569 PMC: 6239234. DOI: 10.15265/IY-2017-027.


References
1.
Segura-Bedmar I, Crespo M, de Pablo-Sanchez C, Martinez P . Resolving anaphoras for the extraction of drug-drug interactions in pharmacological documents. BMC Bioinformatics. 2010; 11 Suppl 2:S1. PMC: 3288782. DOI: 10.1186/1471-2105-11-S2-S1. View

2.
Thompson P, Iqbal S, McNaught J, Ananiadou S . Construction of an annotated corpus to support biomedical information extraction. BMC Bioinformatics. 2009; 10:349. PMC: 2774701. DOI: 10.1186/1471-2105-10-349. View

3.
Bodenreider O . The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2003; 32(Database issue):D267-70. PMC: 308795. DOI: 10.1093/nar/gkh061. View

4.
Rindflesch T, Fiszman M . The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform. 2004; 36(6):462-77. DOI: 10.1016/j.jbi.2003.11.003. View

5.
Nguyen N, Kim J, Miwa M, Matsuzaki T, Tsujii J . Improving protein coreference resolution by simple semantic classification. BMC Bioinformatics. 2012; 13:304. PMC: 3582588. DOI: 10.1186/1471-2105-13-304. View