» Articles » PMID: 36812589

Large-scale Application of Named Entity Recognition to Biomedicine and Epidemiology

Overview
Date 2023 Feb 22
PMID 36812589
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Despite significant advancements in biomedical named entity recognition methods, the clinical application of these systems continues to face many challenges: (1) most of the methods are trained on a limited set of clinical entities; (2) these methods are heavily reliant on a large amount of data for both pre-training and prediction, making their use in production impractical; (3) they do not consider non-clinical entities, which are also related to patient's health, such as social, economic or demographic factors.

Methods: In this paper, we develop Bio-Epidemiology-NER (https://pypi.org/project/Bio-Epidemiology-NER/) an open-source Python package for detecting biomedical named entities from the text. This approach is based on a Transformer-based system and trained on a dataset that is annotated with many named entities (medical, clinical, biomedical, and epidemiological). This approach improves on previous efforts in three ways: (1) it recognizes many clinical entity types, such as medical risk factors, vital signs, drugs, and biological functions; (2) it is easily configurable, reusable, and can scale up for training and inference; (3) it also considers non-clinical factors (age and gender, race and social history and so) that influence health outcomes. At a high level, it consists of the phases: pre-processing, data parsing, named entity recognition, and named entity enhancement.

Results: Experimental results show that our pipeline outperforms other methods on three benchmark datasets with macro-and micro average F1 scores around 90 percent and above.

Conclusion: This package is made publicly available for researchers, doctors, clinicians, and anyone to extract biomedical named entities from unstructured biomedical texts.

Citing Articles

Precision in Parsing: Evaluation of an Open-Source Named Entity Recognizer (NER) in Veterinary Oncology.

Pinard C, Poon A, Lagree A, Wu K, Li J, Tran W Vet Comp Oncol. 2024; 23(1):102-108.

PMID: 39711253 PMC: 11830456. DOI: 10.1111/vco.13035.


Task-Specific Transformer-Based Language Models in Health Care: Scoping Review.

Cho H, Jun T, Kim Y, Kang H, Ahn I, Gwon H JMIR Med Inform. 2024; 12:e49724.

PMID: 39556827 PMC: 11612605. DOI: 10.2196/49724.


Hospital Re-Admission Prediction Using Named Entity Recognition and Explainable Machine Learning.

Dafrallah S, Akhloufi M Diagnostics (Basel). 2024; 14(19).

PMID: 39410555 PMC: 11475863. DOI: 10.3390/diagnostics14192151.


Harnessing PubMed User Query Logs for Post Hoc Explanations of Recommended Similar Articles.

Shin A, Anibal J, Jin Q, Lu Z ArXiv. 2024; .

PMID: 38903741 PMC: 11188129.


Automating Clinical Trial Matches Via Natural Language Processing of Synthetic Electronic Health Records and Clinical Trial Eligibility Criteria.

Murcia V, Aggarwal V, Pesaladinne N, Thammineni R, Do N, Alterovitz G AMIA Jt Summits Transl Sci Proc. 2024; 2024:125-134.

PMID: 38827083 PMC: 11141802.


References
1.
Sun W, Rumshisky A, Uzuner O . Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J Am Med Inform Assoc. 2013; 20(5):806-13. PMC: 3756273. DOI: 10.1136/amiajnl-2013-001628. View

2.
Islamaj Dogan R, Leaman R, Lu Z . NCBI disease corpus: a resource for disease name recognition and concept normalization. J Biomed Inform. 2014; 47:1-10. PMC: 3951655. DOI: 10.1016/j.jbi.2013.12.006. View

3.
Raza S, Schwartz B, Rosella L . CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice. BMC Bioinformatics. 2022; 23(1):210. PMC: 9160513. DOI: 10.1186/s12859-022-04751-6. View

4.
Chen Q, Allot A, Lu Z . LitCovid: an open database of COVID-19 literature. Nucleic Acids Res. 2020; 49(D1):D1534-D1540. PMC: 7778958. DOI: 10.1093/nar/gkaa952. View

5.
Cho H, Lee H . Biomedical named entity recognition using deep neural networks with contextual information. BMC Bioinformatics. 2019; 20(1):735. PMC: 6935215. DOI: 10.1186/s12859-019-3321-4. View