Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations

Overview

Journal JMIR Med Inform

Publisher JMIR Publications

Specialty Medical Informatics

Date 2020 Sep 5

PMID 32885786

Citations 9

Authors

Yongbin Li

Xiaohua Wang

Linhu Hui

Liping Zou

Hongjin Li

Luo Xu

Weihai Liu

Affiliations

Soon will be listed here.

Abstract

Background: Clinical named entity recognition (CNER), whose goal is to automatically identify clinical entities in electronic medical records (EMRs), is an important research direction of clinical text data mining and information extraction. The promotion of CNER can provide support for clinical decision making and medical knowledge base construction, which could then improve overall medical quality. Compared with English CNER, and due to the complexity of Chinese word segmentation and grammar, Chinese CNER was implemented later and is more challenging.

Objective: With the development of distributed representation and deep learning, a series of models have been applied in Chinese CNER. Different from the English version, Chinese CNER is mainly divided into character-based and word-based methods that cannot make comprehensive use of EMR information and cannot solve the problem of ambiguity in word representation.

Methods: In this paper, we propose a lattice long short-term memory (LSTM) model combined with a variant contextualized character representation and a conditional random field (CRF) layer for Chinese CNER: the Embeddings from Language Models (ELMo)-lattice-LSTM-CRF model. The lattice LSTM model can effectively utilize the information from characters and words in Chinese EMRs; in addition, the variant ELMo model uses Chinese characters as input instead of the character-encoding layer of the ELMo model, so as to learn domain-specific contextualized character embeddings.

Results: We evaluated our method using two Chinese CNER datasets from the China Conference on Knowledge Graph and Semantic Computing (CCKS): the CCKS-2017 CNER dataset and the CCKS-2019 CNER dataset. We obtained F1 scores of 90.13% and 85.02% on the test sets of these two datasets, respectively.

Conclusions: Our results show that our proposed method is effective in Chinese CNER. In addition, the results of our experiments show that variant contextualized character representations can significantly improve the performance of the model.

Citing Articles

Chinese Clinical Named Entity Recognition With Segmentation Synonym Sentence Synthesis Mechanism: Algorithm Development and Validation.

Tang J, Huang Z, Xu H, Zhang H, Huang H, Tang M JMIR Med Inform. 2024; 12:e60334.

PMID: 39622697 PMC: 11612518. DOI: 10.2196/60334.

MF-MNER: Multi-models Fusion for MNER in Chinese Clinical Electronic Medical Records.

Du H, Xu J, Du Z, Chen L, Ma S, Wei D Interdiscip Sci. 2024; 16(2):489-502.

PMID: 38578388 PMC: 11289171. DOI: 10.1007/s12539-024-00624-z.

A BERT-Span model for Chinese named entity recognition in rehabilitation medicine.

Zhong J, Xuan Z, Wang K, Cheng Z PeerJ Comput Sci. 2023; 9:e1535.

PMID: 37705622 PMC: 10495977. DOI: 10.7717/peerj-cs.1535.

Chinese medical entity recognition based on the dual-branch TENER model.

Peng H, Zhang Z, Liu D, Qin X BMC Med Inform Decis Mak. 2023; 23(1):136.

PMID: 37488521 PMC: 10367390. DOI: 10.1186/s12911-023-02243-y.

Chinese Clinical Named Entity Recognition From Electronic Medical Records Based on Multisemantic Features by Using Robustly Optimized Bidirectional Encoder Representation From Transformers Pretraining Approach Whole Word Masking and Convolutional....

Wang W, Li X, Ren H, Gao D, Fang A JMIR Med Inform. 2023; 11:e44597.

PMID: 37163343 PMC: 10209791. DOI: 10.2196/44597.

References

Fukuda K, Tamura A, Tsunoda T, Takagi T . Toward information extraction: identifying protein names from biological papers. Pac Symp Biocomput. 1998; :707-18. View

Uzuner O, South B, Shen S, DuVall S . 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc. 2011; 18(5):552-6. PMC: 3168320. DOI: 10.1136/amiajnl-2011-000203. View

Cocos A, Fiks A, Masino A . Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts. J Am Med Inform Assoc. 2017; 24(4):813-821. PMC: 7651964. DOI: 10.1093/jamia/ocw180. View

Savova G, Masanz J, Ogren P, Zheng J, Sohn S, Kipper-Schuler K . Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010; 17(5):507-13. PMC: 2995668. DOI: 10.1136/jamia.2009.001560. View

Zhao S, Cai Z, Chen H, Wang Y, Liu F, Liu A . Adversarial training based lattice LSTM for Chinese clinical named entity recognition. J Biomed Inform. 2019; 99:103290. DOI: 10.1016/j.jbi.2019.103290. View

Wang Q, Zhou Y, Ruan T, Gao D, Xia Y, He P . Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition. J Biomed Inform. 2019; 92:103133. DOI: 10.1016/j.jbi.2019.103133. View

Demner-Fushman D, Chapman W, McDonald C . What can natural language processing do for clinical decision support?. J Biomed Inform. 2009; 42(5):760-72. PMC: 2757540. DOI: 10.1016/j.jbi.2009.08.007. View

Wasserman R . Electronic medical records (EMRs), epidemiology, and epistemology: reflections on EMRs and future pediatric clinical research. Acad Pediatr. 2011; 11(4):280-7. PMC: 3138824. DOI: 10.1016/j.acap.2011.02.007. View

Jauregi Unanue I, Borzeshi E, Piccardi M . Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition. J Biomed Inform. 2017; 76:102-109. DOI: 10.1016/j.jbi.2017.11.007. View

10.

Zhang Y, Wang X, Hou Z, Li J . Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods. JMIR Med Inform. 2018; 6(4):e50. PMC: 6315256. DOI: 10.2196/medinform.9965. View

11.

Rindflesch T, Tanabe L, Weinstein J, Hunter L . EDGAR: extraction of drugs, genes and relations from the biomedical literature. Pac Symp Biocomput. 2000; :517-28. PMC: 2709525. DOI: 10.1142/9789814447331_0049. View

12.

Habibi M, Weber L, Neves M, Wiegandt D, Leser U . Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics. 2017; 33(14):i37-i48. PMC: 5870729. DOI: 10.1093/bioinformatics/btx228. View

13.

Zeng Q, Goryachev S, Weiss S, Sordo M, Murphy S, Lazarus R . Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak. 2006; 6:30. PMC: 1553439. DOI: 10.1186/1472-6947-6-30. View