Comparison of Named Entity Recognition Methodologies in Biomedical Documents

Overview

Journal Biomed Eng Online

Publisher Biomed Central

Specialty Biomedical Engineering

Date 2018 Nov 7

PMID 30396340

Citations 4

Authors

Hye-Jeong Song

Byeong-Cheol Jo

Chan-Young Park

Jong-Dae Kim

Yu-Seop Kim

Affiliations

Soon will be listed here.

Abstract

Background: Biomedical named entity recognition (Bio-NER) is a fundamental task in handling biomedical text terms, such as RNA, protein, cell type, cell line, and DNA. Bio-NER is one of the most elementary and core tasks in biomedical knowledge discovery from texts. The system described here is developed by using the BioNLP/NLPBA 2004 shared task. Experiments are conducted on a training and evaluation set provided by the task organizers.

Results: Our results show that, compared with a baseline having a 70.09% F1 score, the RNN Jordan- and Elman-type algorithms have F1 scores of approximately 60.53% and 58.80%, respectively. When we use CRF as a machine learning algorithm, CCA, GloVe, and Word2Vec have F1 scores of 72.73%, 72.74%, and 72.82%, respectively.

Conclusions: By using the word embedding constructed through the unsupervised learning, the time and cost required to construct the learning data can be saved.

Citing Articles

A Combined Manual Annotation and Deep-Learning Natural Language Processing Study on Accurate Entity Extraction in Hereditary Disease Related Biomedical Literature.

Huang D, Zeng Q, Xiong Y, Liu S, Pang C, Xia M Interdiscip Sci. 2024; 16(2):333-344.

PMID: 38340264 PMC: 11289304. DOI: 10.1007/s12539-024-00605-2.

A Systematic Approach to Configuring MetaMap for Optimal Performance.

Jing X, Indani A, Hubig N, Min H, Gong Y, Cimino J Methods Inf Med. 2022; 61(S 02):e51-e63.

PMID: 35613942 PMC: 9788913. DOI: 10.1055/a-1862-0421.

Discovering and Summarizing Relationships Between Chemicals, Genes, Proteins, and Diseases in PubChem.

Zaslavsky L, Cheng T, Gindulyte A, He S, Kim S, Li Q Front Res Metr Anal. 2021; 6:689059.

PMID: 34322655 PMC: 8311438. DOI: 10.3389/frma.2021.689059.

Using the PubAnnotation ecosystem to perform agile text mining on Genomics & Informatics: a tutorial review.

Nam H, Yamada R, Park H Genomics Inform. 2020; 18(2):e13.

PMID: 32634867 PMC: 7362947. DOI: 10.5808/GI.2020.18.2.e13.

References

Baker P, Goble C, Bechhofer S, Paton N, Stevens R, Brass A . An ontology for bioinformatics applications. Bioinformatics. 1999; 15(6):510-20. DOI: 10.1093/bioinformatics/15.6.510. View

Blaschke C, Andrade M, Ouzounis C, Valencia A . Automatic extraction of biological information from scientific text: protein-protein interactions. Proc Int Conf Intell Syst Mol Biol. 2000; :60-7. View

Craven M, Kumlien J . Constructing biological knowledge bases by extracting information from text sources. Proc Int Conf Intell Syst Mol Biol. 2000; :77-86. View

Krauthammer M, Nenadic G . Term identification in the biomedical literature. J Biomed Inform. 2004; 37(6):512-26. DOI: 10.1016/j.jbi.2004.08.004. View

Liu H, Hu Z, Torii M, Wu C, Friedman C . Quantitative assessment of dictionary-based protein named entity tagging. J Am Med Inform Assoc. 2006; 13(5):497-507. PMC: 1561801. DOI: 10.1197/jamia.M2085. View