Clinical Concept Annotation with Contextual Word Embedding in Active Transfer Learning Environment

Overview

Journal Digit Health

Specialty Medical Informatics

Date 2024 Dec 23

PMID 39711738

Authors

Asim Abbas

Mark Lee

Niloofer Shanavas

Venelin Kovatchev

Affiliations

Soon will be listed here.

Abstract

Objective: The study aims to present an active learning approach that automatically extracts clinical concepts from unstructured data and classifies them into explicit categories such as Problem, Treatment, and Test while preserving high precision and recall and demonstrating the approach through experiments using i2b2 public datasets.

Methods: Initially labeled data are acquired from a lexical-based approach in sufficient amounts to perform an active learning process. A contextual word embedding similarity approach is adopted using BERT base variant models such as ClinicalBERT, DistilBERT, and SCIBERT to automatically classify the unlabeled clinical concept into explicit categories. Additionally, deep learning and large language model (LLM) are trained on acquiring label data through active learning.

Results: Using i2b2 datasets (426 clinical notes), the lexical-based method achieved precision, recall, and F1-scores of 76%, 70%, and 73%. SCIBERT excelled in active transfer learning, yielding precision of 70.84%, recall of 77.40%, F1-score of 73.97%, and accuracy of 69.30%, surpassing counterpart models. Among deep learning models, convolutional neural networks (CNNs) trained with embeddings (BERTBase, DistilBERT, SCIBERT, ClinicalBERT) achieved training accuracies of 92-95% and testing accuracies of 89-93%. These results were higher compared to other deep learning models. Additionally, we individually evaluated these LLMs; among them, ClinicalBERT achieved the highest performance, with a training accuracy of 98.4% and a testing accuracy of 96%, outperforming the others.

Conclusions: The proposed methodology enhances clinical concept extraction by integrating active learning and models like SCIBERT and CNN. It improves annotation efficiency while maintaining high accuracy, showcasing potential for clinical applications.

References

Yu S, Liao K, Shaw S, Gainer V, Churchill S, Szolovits P . Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources. J Am Med Inform Assoc. 2015; 22(5):993-1000. PMC: 4986664. DOI: 10.1093/jamia/ocv034. View

Zheng S, Lu J, Ghasemzadeh N, Hayek S, Quyyumi A, Wang F . Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies. JMIR Med Inform. 2017; 5(2):e12. PMC: 5442348. DOI: 10.2196/medinform.7235. View

Denny J, Spickard 3rd A, Johnson K, Peterson N, Peterson J, Miller R . Evaluation of a method to identify and categorize section headers in clinical documents. J Am Med Inform Assoc. 2009; 16(6):806-15. PMC: 3002123. DOI: 10.1197/jamia.M3037. View

Uzuner O, South B, Shen S, DuVall S . 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc. 2011; 18(5):552-6. PMC: 3168320. DOI: 10.1136/amiajnl-2011-000203. View

Wu S, Roberts K, Datta S, Du J, Ji Z, Si Y . Deep learning in clinical natural language processing: a methodical review. J Am Med Inform Assoc. 2019; 27(3):457-470. PMC: 7025365. DOI: 10.1093/jamia/ocz200. View

Srinivasan S, Rindflesch T, Hole W, Aronson A, Mork J . Finding UMLS Metathesaurus concepts in MEDLINE. Proc AMIA Symp. 2002; :727-31. PMC: 2244184. View

Ellsworth M, Dziadzko M, OHoro J, Farrell A, Zhang J, Herasevich V . An appraisal of published usability evaluations of electronic health records via systematic review. J Am Med Inform Assoc. 2016; 24(1):218-226. PMC: 7654077. DOI: 10.1093/jamia/ocw046. View

Uzuner O, Solti I, Cadag E . Extracting medication information from clinical text. J Am Med Inform Assoc. 2010; 17(5):514-8. PMC: 2995677. DOI: 10.1136/jamia.2010.003947. View

Hussain M, Satti F, Hussain J, Ali T, Ali S, Bilal H . A practical approach towards causality mining in clinical text using active transfer learning. J Biomed Inform. 2021; 123:103932. DOI: 10.1016/j.jbi.2021.103932. View

10.

Childs L, Enelow R, Simonsen L, Heintzelman N, Kowalski K, Taylor R . Description of a rule-based system for the i2b2 challenge in natural language processing for clinical data. J Am Med Inform Assoc. 2009; 16(4):571-5. PMC: 2705261. DOI: 10.1197/jamia.M3083. View

11.

Abbas A, Afzal M, Hussain J, Ali T, Bilal H, Lee S . Clinical Concept Extraction with Lexical Semantics to Support Automatic Annotation. Int J Environ Res Public Health. 2021; 18(20). PMC: 8535468. DOI: 10.3390/ijerph182010564. View

12.

Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N . Clinical information extraction applications: A literature review. J Biomed Inform. 2017; 77:34-49. PMC: 5771858. DOI: 10.1016/j.jbi.2017.11.011. View

13.

Wang C, Akella R . A Hybrid Approach to Extracting Disorder Mentions from Clinical Notes. AMIA Jt Summits Transl Sci Proc. 2015; 2015:183-7. PMC: 4525272. View

14.

Meystre S, Kim Y, Gobbel G, Matheny M, Redd A, Bray B . Congestive heart failure information extraction framework for automated treatment performance measures assessment. J Am Med Inform Assoc. 2016; 24(e1):e40-e46. PMC: 7651945. DOI: 10.1093/jamia/ocw097. View

15.

Fraile Navarro D, Ijaz K, Rezazadegan D, Rahimi-Ardabili H, Dras M, Coiera E . Clinical named entity recognition and relation extraction using natural language processing of medical free text: A systematic review. Int J Med Inform. 2023; 177:105122. DOI: 10.1016/j.ijmedinf.2023.105122. View

16.

Janssen A, Donnelly C, Shaw T . A Taxonomy for Health Information Systems. J Med Internet Res. 2024; 26:e47682. PMC: 11179026. DOI: 10.2196/47682. View

17.

Lee H, Wu Y, Zhang Y, Xu J, Xu H, Roberts K . A hybrid approach to automatic de-identification of psychiatric notes. J Biomed Inform. 2017; 75S:S19-S27. PMC: 5705430. DOI: 10.1016/j.jbi.2017.06.006. View

18.

Khalifa A, Meystre S . Adapting existing natural language processing resources for cardiovascular risk factors identification in clinical notes. J Biomed Inform. 2015; 58 Suppl:S128-S132. PMC: 4983192. DOI: 10.1016/j.jbi.2015.08.002. View

19.

Yang H, Garibaldi J . Automatic detection of protected health information from clinic narratives. J Biomed Inform. 2015; 58 Suppl:S30-S38. PMC: 4989090. DOI: 10.1016/j.jbi.2015.06.015. View

20.

Si Y, Wang J, Xu H, Roberts K . Enhancing clinical concept extraction with contextual embeddings. J Am Med Inform Assoc. 2019; 26(11):1297-1304. PMC: 6798561. DOI: 10.1093/jamia/ocz096. View