DTranNER: Biomedical Named Entity Recognition with Deep Learning-based Label-label Transition Model

Overview

Journal BMC Bioinformatics

Publisher Biomed Central

Specialty Biology

Date 2020 Feb 13

PMID 32046638

Citations 8

Authors

S K Hong

Jae-Gil Lee

Affiliations

Soon will be listed here.

Abstract

Background: Biomedical named-entity recognition (BioNER) is widely modeled with conditional random fields (CRF) by regarding it as a sequence labeling problem. The CRF-based methods yield structured outputs of labels by imposing connectivity between the labels. Recent studies for BioNER have reported state-of-the-art performance by combining deep learning-based models (e.g., bidirectional Long Short-Term Memory) and CRF. The deep learning-based models in the CRF-based methods are dedicated to estimating individual labels, whereas the relationships between connected labels are described as static numbers; thereby, it is not allowed to timely reflect the context in generating the most plausible label-label transitions for a given input sentence. Regardless, correctly segmenting entity mentions in biomedical texts is challenging because the biomedical terms are often descriptive and long compared with general terms. Therefore, limiting the label-label transitions as static numbers is a bottleneck in the performance improvement of BioNER.

Results: We introduce DTranNER, a novel CRF-based framework incorporating a deep learning-based label-label transition model into BioNER. DTranNER uses two separate deep learning-based networks: Unary-Network and Pairwise-Network. The former is to model the input for determining individual labels, and the latter is to explore the context of the input for describing the label-label transitions. We performed experiments on five benchmark BioNER corpora. Compared with current state-of-the-art methods, DTranNER achieves the best F1-score of 84.56% beyond 84.40% on the BioCreative II gene mention (BC2GM) corpus, the best F1-score of 91.99% beyond 91.41% on the BioCreative IV chemical and drug (BC4CHEMD) corpus, the best F1-score of 94.16% beyond 93.44% on the chemical NER, the best F1-score of 87.22% beyond 86.56% on the disease NER of the BioCreative V chemical disease relation (BC5CDR) corpus, and a near-best F1-score of 88.62% on the NCBI-Disease corpus.

Conclusions: Our results indicate that the incorporation of the deep learning-based label-label transition model provides distinctive contextual clues to enhance BioNER over the static transition model. We demonstrate that the proposed framework enables the dynamic transition model to adaptively explore the contextual relations between adjacent labels in a fine-grained way. We expect that our study can be a stepping stone for further prosperity of biomedical literature mining.

Citing Articles

BioBBC: a multi-feature model that enhances the detection of biomedical entities.

Alamro H, Gojobori T, Essack M, Gao X Sci Rep. 2024; 14(1):7697.

PMID: 38565624 PMC: 10987643. DOI: 10.1038/s41598-024-58334-x.

Biomedical named entity recognition based on fusion multi-features embedding.

Li M, Yang H, Liu Y Technol Health Care. 2023; 31(S1):111-121.

PMID: 37038786 PMC: 10258877. DOI: 10.3233/THC-236011.

A BERT-based ensemble learning approach for the BioCreative VII challenges: full-text chemical identification and multi-label classification in PubMed articles.

Lin S, Yeh W, Chiu Y, Chang Y, Hsu M, Chen Y Database (Oxford). 2022; 2022.

PMID: 35849027 PMC: 9290865. DOI: 10.1093/database/baac056.

Chinese Clinical Named Entity Recognition with ALBERT and MHA Mechanism.

Li D, Long J, Qu J, Zhang X Evid Based Complement Alternat Med. 2022; 2022:2056039.

PMID: 35656458 PMC: 9152388. DOI: 10.1155/2022/2056039.

Parallel sequence tagging for concept recognition.

Furrer L, Cornelius J, Rinaldi F BMC Bioinformatics. 2022; 22(Suppl 1):623.

PMID: 35331131 PMC: 8943923. DOI: 10.1186/s12859-021-04511-y.

References

Zhou G, Zhang J, Su J, Shen D, Tan C . Recognizing names in biomedical texts: a machine learning approach. Bioinformatics. 2004; 20(7):1178-90. DOI: 10.1093/bioinformatics/bth060. View

Lin G, Shen C, Hengel A, Reid I . Exploring Context with Deep Structured Models for Semantic Segmentation. IEEE Trans Pattern Anal Mach Intell. 2017; 40(6):1352-1366. DOI: 10.1109/TPAMI.2017.2708714. View

Yoon W, So C, Lee J, Kang J . CollaboNet: collaboration of deep neural networks for biomedical named entity recognition. BMC Bioinformatics. 2019; 20(Suppl 10):249. PMC: 6538547. DOI: 10.1186/s12859-019-2813-6. View

Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J . STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2014; 43(Database issue):D447-52. PMC: 4383874. DOI: 10.1093/nar/gku1003. View

Smith L, Tanabe L, Ando R, Kuo C, Chung I, Hsu C . Overview of BioCreative II gene mention recognition. Genome Biol. 2008; 9 Suppl 2:S2. PMC: 2559986. DOI: 10.1186/gb-2008-9-s2-s2. View

Krallinger M, Rabal O, Leitner F, Vazquez M, Salgado D, Lu Z . The CHEMDNER corpus of chemicals and drugs and its annotation principles. J Cheminform. 2015; 7:S2. PMC: 4331692. DOI: 10.1186/1758-2946-7-S1-S2. View

Lee J, Yoon W, Kim S, Kim D, Kim S, So C . BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2019; 36(4):1234-1240. PMC: 7703786. DOI: 10.1093/bioinformatics/btz682. View

Dang T, Le H, Nguyen T, Vu S . D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information. Bioinformatics. 2018; 34(20):3539-3546. DOI: 10.1093/bioinformatics/bty356. View

Gurulingappa H, Mateen-Rajput A, Toldo L . Extraction of potential adverse drug events from medical case reports. J Biomed Semantics. 2012; 3(1):15. PMC: 3599676. DOI: 10.1186/2041-1480-3-15. View

10.

Habibi M, Weber L, Neves M, Wiegandt D, Leser U . Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics. 2017; 33(14):i37-i48. PMC: 5870729. DOI: 10.1093/bioinformatics/btx228. View

11.

Zhang W, Chen Y, Liu F, Luo F, Tian G, Li X . Predicting potential drug-drug interactions by integrating chemical, biological, phenotypic and network data. BMC Bioinformatics. 2017; 18(1):18. PMC: 5217341. DOI: 10.1186/s12859-016-1415-9. View

12.

Islamaj Dogan R, Leaman R, Lu Z . NCBI disease corpus: a resource for disease name recognition and concept normalization. J Biomed Inform. 2014; 47:1-10. PMC: 3951655. DOI: 10.1016/j.jbi.2013.12.006. View

13.

Wang X, Zhang Y, Ren X, Zhang Y, Zitnik M, Shang J . Cross-type biomedical named entity recognition with deep multi-task learning. Bioinformatics. 2018; 35(10):1745-1752. DOI: 10.1093/bioinformatics/bty869. View

14.

Leaman R, Wei C, Lu Z . tmChem: a high performance approach for chemical named entity recognition and normalization. J Cheminform. 2015; 7:S3. PMC: 4331693. DOI: 10.1186/1758-2946-7-S1-S3. View

15.

Bossy R, Jourde J, Manine A, Veber P, Alphonse E, van de Guchte M . BioNLP Shared Task--The Bacteria Track. BMC Bioinformatics. 2012; 13 Suppl 11:S3. PMC: 3384254. DOI: 10.1186/1471-2105-13-S11-S3. View

16.

Hochreiter S, Schmidhuber J . Long short-term memory. Neural Comput. 1997; 9(8):1735-80. DOI: 10.1162/neco.1997.9.8.1735. View

17.

Luo L, Yang Z, Yang P, Zhang Y, Wang L, Lin H . An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Bioinformatics. 2017; 34(8):1381-1388. DOI: 10.1093/bioinformatics/btx761. View

18.

Hinton G . Training products of experts by minimizing contrastive divergence. Neural Comput. 2002; 14(8):1771-800. DOI: 10.1162/089976602760128018. View