Document-level Attention-based BiLSTM-CRF Incorporating Disease Dictionary for Disease Named Entity Recognition

Overview

Journal Comput Biol Med

Publisher Elsevier

Specialties Biology
General Medicine
Medical Informatics

Date 2019 Apr 20

PMID 31003175

Citations 17

Authors

Kai Xu

Zhenguo Yang

Peipei Kang

Qi Wang

Wenyin Liu

Affiliations

Soon will be listed here.

Abstract

Background: Disease named entity recognition (NER) plays an important role in biomedical research. There are a significant number of challenging issues to be addressed; among these, the identification of rare diseases and complex disease names and the problem of tagging inconsistency (i.e., if an entity is tagged differently in a document) are attracting substantial research attention.

Methods: We propose a new neural network method named Dic-Att-BiLSTM-CRF (DABLC) for disease NER. DABLC applies an efficient exact string matching method to match disease entities with a disease dictionary; here, the dictionary is constructed based on the Disease Ontology. Furthermore, DABLC constructs a dictionary attention layer by incorporating a disease dictionary matching method and document-level attention mechanism. Finally, a bidirectional long short-term memory network and conditional random field (BiLSTM-CRF) with a dictionary attention layer is proposed to combine the disease dictionary to develop disease NER.

Results: Extensive experiments are conducted on two widely-used corpora: the NCBI disease corpus and the BioCreative V CDR corpus. We apply each test on 10 executions of each model, with a 95% confidence interval. DABLC achieves the highest F1 scores (NCBI: Precision = 0.883, Recall = 0.89, F1 = 0.886; BioCreative V CDR: Precision = 0.891, Recall = 0.875, F1 = 0.883), outperforming the state-of-the-art methods.

Conclusion: DABLC combines the advantages of both external dictionary resources and deep attention neural networks. This aids the identification of rare diseases and complex disease names; moreover, it reduces the impact of tagging inconsistency. Special disease NER and deep learning models addressing long sentences are noteworthy areas for future examination.

Citing Articles

Chinese Clinical Named Entity Recognition With Segmentation Synonym Sentence Synthesis Mechanism: Algorithm Development and Validation.

Tang J, Huang Z, Xu H, Zhang H, Huang H, Tang M JMIR Med Inform. 2024; 12:e60334.

PMID: 39622697 PMC: 11612518. DOI: 10.2196/60334.

Comparative Analysis of Large Language Models in Chinese Medical Named Entity Recognition.

Zhu Z, Zhao Q, Li J, Ge Y, Ding X, Gu T Bioengineering (Basel). 2024; 11(10).

PMID: 39451358 PMC: 11504658. DOI: 10.3390/bioengineering11100982.

A Combined Manual Annotation and Deep-Learning Natural Language Processing Study on Accurate Entity Extraction in Hereditary Disease Related Biomedical Literature.

Huang D, Zeng Q, Xiong Y, Liu S, Pang C, Xia M Interdiscip Sci. 2024; 16(2):333-344.

PMID: 38340264 PMC: 11289304. DOI: 10.1007/s12539-024-00605-2.

Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach.

Raza S, Schwartz B BMC Med Inform Decis Mak. 2023; 23(1):20.

PMID: 36703154 PMC: 9879259. DOI: 10.1186/s12911-023-02117-3.

Extraction of knowledge graph of Covid-19 through mining of unstructured biomedical corpora.

Gajendran S, Manjula D, Sugumaran V, Hema R Comput Biol Chem. 2023; 102:107808.

PMID: 36621289 PMC: 9807269. DOI: 10.1016/j.compbiolchem.2022.107808.