Applying Active Learning to Assertion Classification of Concepts in Clinical Text

Overview

Journal J Biomed Inform

Publisher Elsevier

Specialty Medical Informatics

Date 2011 Dec 1

PMID 22127105

Citations 16

Authors

Yukun Chen

Subramani Mani

Hua Xu

Affiliations

Soon will be listed here.

Abstract

Supervised machine learning methods for clinical natural language processing (NLP) research require a large number of annotated samples, which are very expensive to build because of the involvement of physicians. Active learning, an approach that actively samples from a large pool, provides an alternative solution. Its major goal in classification is to reduce the annotation effort while maintaining the quality of the predictive model. However, few studies have investigated its uses in clinical NLP. This paper reports an application of active learning to a clinical text classification task: to determine the assertion status of clinical concepts. The annotated corpus for the assertion classification task in the 2010 i2b2/VA Clinical NLP Challenge was used in this study. We implemented several existing and newly developed active learning algorithms and assessed their uses. The outcome is reported in the global ALC score, based on the Area under the average Learning Curve of the AUC (Area Under the Curve) score. Results showed that when the same number of annotated samples was used, active learning strategies could generate better classification models (best ALC-0.7715) than the passive learning method (random sampling) (ALC-0.7411). Moreover, to achieve the same classification performance, active learning strategies required fewer samples than the random sampling method. For example, to achieve an AUC of 0.79, the random sampling method used 32 samples, while our best active learning algorithm required only 12 samples, a reduction of 62.5% in manual annotation effort.

Citing Articles

Scalable information extraction from free text electronic health records using large language models.

Gu B, Shao V, Liao Z, Carducci V, Brufau S, Yang J BMC Med Res Methodol. 2025; 25(1):23.

PMID: 39871166 PMC: 11773977. DOI: 10.1186/s12874-025-02470-z.

The SAFE procedure: a practical stopping heuristic for active learning-based screening in systematic reviews and meta-analyses.

Boetje J, van de Schoot R Syst Rev. 2024; 13(1):81.

PMID: 38429798 PMC: 10908130. DOI: 10.1186/s13643-024-02502-7.

Active learning-based systematic reviewing using switching classification models: the case of the onset, maintenance, and relapse of depressive disorders.

Teijema J, Hofstee L, Brouwer M, de Bruin J, Ferdinands G, de Boer J Front Res Metr Anal. 2023; 8:1178181.

PMID: 37260784 PMC: 10227618. DOI: 10.3389/frma.2023.1178181.

Improving Diabetes-Related Biomedical Literature Exploration in the Clinical Decision-making Process via Interactive Classification and Topic Discovery: Methodology Development Study.

Ahne A, Fagherazzi G, Tannier X, Czernichow T, Orchard F J Med Internet Res. 2022; 24(1):e27434.

PMID: 35040795 PMC: 8808347. DOI: 10.2196/27434.

Deep active learning for classifying cancer pathology reports.

De Angeli K, Gao S, Alawad M, Yoon H, Schaefferkoetter N, Wu X BMC Bioinformatics. 2021; 22(1):113.

PMID: 33750288 PMC: 7941989. DOI: 10.1186/s12859-021-04047-1.

References

Liu Y . Active learning with support vector machine applied to gene expression data for cancer classification. J Chem Inf Comput Sci. 2004; 44(6):1936-41. DOI: 10.1021/ci049810a. View

Uzuner O, South B, Shen S, DuVall S . 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc. 2011; 18(5):552-6. PMC: 3168320. DOI: 10.1136/amiajnl-2011-000203. View

Uzuner O, Solti I, Xia F, Cadag E . Community annotation experiment for ground truth generation for the i2b2 medication challenge. J Am Med Inform Assoc. 2010; 17(5):519-23. PMC: 2995684. DOI: 10.1136/jamia.2010.004200. View

Meystre S, Savova G, Kipper-Schuler K, Hurdle J . Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008; :128-44. View

Jiang M, Chen Y, Liu M, Rosenbloom S, Mani S, Denny J . A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J Am Med Inform Assoc. 2011; 18(5):601-6. PMC: 3168315. DOI: 10.1136/amiajnl-2011-000163. View