» Articles » PMID: 35939277

Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding

Overview
Date 2022 Aug 8
PMID 35939277
Authors
Affiliations
Soon will be listed here.
Abstract

Applying methods in natural language processing on electronic health records (EHR) data is a growing field. Existing corpus and annotation focus on modeling textual features and relation prediction. However, there is a paucity of annotated corpus built to model clinical diagnostic thinking, a process involving text understanding, domain knowledge abstraction and reasoning. This work introduces a hierarchical annotation schema with three stages to address clinical text understanding, clinical reasoning, and summarization. We created an annotated corpus based on an extensive collection of publicly available daily progress notes, a type of EHR documentation that is collected in time series in a problem-oriented format. The conventional format for a progress note follows a Subjective, Objective, Assessment and Plan heading (SOAP). We also define a new suite of tasks, Progress Note Understanding, with three tasks utilizing the three annotation stages. The novel suite of tasks was designed to train and evaluate future NLP models for clinical text understanding, clinical knowledge representation, inference, and summarization.

Citing Articles

Generalizable clinical note section identification with large language models.

Zhou W, Miller T JAMIA Open. 2024; 7(3):ooae075.

PMID: 39139700 PMC: 11319784. DOI: 10.1093/jamiaopen/ooae075.


Improving the Transferability of Clinical Note Section Classification Models with BERT and Large Language Model Ensembles.

Zhou W, Dligach D, Afshar M, Gao Y, Miller T Proc Conf Assoc Comput Linguist Meet. 2023; 2023:125-130.

PMID: 37786810 PMC: 10544420.


Improving model transferability for clinical note section classification models using continued pretraining.

Zhou W, Yetisgen M, Afshar M, Gao Y, Savova G, Miller T J Am Med Inform Assoc. 2023; 31(1):89-97.

PMID: 37725927 PMC: 10746297. DOI: 10.1093/jamia/ocad190.


Overview of the Problem List Summarization (ProbSum) 2023 Shared Task on Summarizing Patients' Active Diagnoses and Problems from Electronic Health Record Progress Notes.

Gao Y, Dligach D, Miller T, Churpek M, Afshar M Proc Conf Assoc Comput Linguist Meet. 2023; 2023:461-467.

PMID: 37583489 PMC: 10426335. DOI: 10.18653/v1/2023.bionlp-1.43.


Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning.

Sharma B, Gao Y, Miller T, Churpek M, Afshar M, Dligach D Proc Conf Assoc Comput Linguist Meet. 2023; 2023(ClinicalNLP):78-85.

PMID: 37492270 PMC: 10368094.


References
1.
Hripcsak G, Vawdrey D, Fred M, Bostwick S . Use of electronic clinical documentation: time spent and team interactions. J Am Med Inform Assoc. 2011; 18(2):112-7. PMC: 3116265. DOI: 10.1136/jamia.2010.008441. View

2.
Devarakonda M, Mehta N, Tsou C, Liang J, Nowacki A, Jelovsek J . Automated problem list generation and physicians perspective from a pilot study. Int J Med Inform. 2017; 105:121-129. DOI: 10.1016/j.ijmedinf.2017.05.015. View

3.
Adams G, Alsentzer E, Ketenci M, Zucker J, Elhadad N . What's in a Summary? Laying the Groundwork for Advances in Hospital-Course Summarization. Proc Conf. 2021; 2021:4794-4811. PMC: 8225248. DOI: 10.18653/v1/2021.naacl-main.382. View

4.
Shoolin J, Ozeran L, Hamann C, Bria 2nd W . Association of Medical Directors of Information Systems consensus on inpatient electronic health record documentation. Appl Clin Inform. 2013; 4(2):293-303. PMC: 3716423. DOI: 10.4338/ACI-2013-02-R-0012. View

5.
Bodenreider O . The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2003; 32(Database issue):D267-70. PMC: 308795. DOI: 10.1093/nar/gkh061. View