» Articles » PMID: 36795066

The 2022 N2c2/UW Shared Task on Extracting Social Determinants of Health

Overview
Date 2023 Feb 16
PMID 36795066
Authors
Affiliations
Soon will be listed here.
Abstract

Objective: The n2c2/UW SDOH Challenge explores the extraction of social determinant of health (SDOH) information from clinical notes. The objectives include the advancement of natural language processing (NLP) information extraction techniques for SDOH and clinical information more broadly. This article presents the shared task, data, participating teams, performance results, and considerations for future work.

Materials And Methods: The task used the Social History Annotated Corpus (SHAC), which consists of clinical text with detailed event-based annotations for SDOH events, such as alcohol, drug, tobacco, employment, and living situation. Each SDOH event is characterized through attributes related to status, extent, and temporality. The task includes 3 subtasks related to information extraction (Subtask A), generalizability (Subtask B), and learning transfer (Subtask C). In addressing this task, participants utilized a range of techniques, including rules, knowledge bases, n-grams, word embeddings, and pretrained language models (LM).

Results: A total of 15 teams participated, and the top teams utilized pretrained deep learning LM. The top team across all subtasks used a sequence-to-sequence approach achieving 0.901 F1 for Subtask A, 0.774 F1 Subtask B, and 0.889 F1 for Subtask C.

Conclusions: Similar to many NLP tasks and domains, pretrained LM yielded the best performance, including generalizability and learning transfer. An error analysis indicates extraction performance varies by SDOH, with lower performance achieved for conditions, like substance use and homelessness, which increase health risks (risk factors) and higher performance achieved for conditions, like substance abstinence and living with family, which reduce health risks (protective factors).

Citing Articles

SBDH-Reader: an LLM-powered method for extracting social and behavioral determinants of health from medical notes.

Gu Z, He L, Naeem A, Chan P, Mohamed A, Khalil H medRxiv. 2025; .

PMID: 40034759 PMC: 11875322. DOI: 10.1101/2025.02.19.25322576.


Decoding substance use disorder severity from clinical notes using a large language model.

Mahbub M, Dams G, Srinivasan S, Rizy C, Danciu I, Trafton J Npj Ment Health Res. 2025; 4(1):5.

PMID: 39915681 PMC: 11802718. DOI: 10.1038/s44184-024-00114-6.


Multifaceted Natural Language Processing Task-Based Evaluation of Bidirectional Encoder Representations From Transformers Models for Bilingual (Korean and English) Clinical Notes: Algorithm Development and Validation.

Kim K, Park S, Min J, Park S, Kim J, Eun J JMIR Med Inform. 2024; 12:e52897.

PMID: 39475725 PMC: 11539635. DOI: 10.2196/52897.


CACER: Clinical concept Annotations for Cancer Events and Relations.

Fu Y, Ramachandran G, Halwani A, McInnes B, Xia F, Lybarger K J Am Med Inform Assoc. 2024; 31(11):2583-2594.

PMID: 39225779 PMC: 11491616. DOI: 10.1093/jamia/ocae231.


Disambiguation of acronyms in clinical narratives with large language models.

Kugic A, Schulz S, Kreuzthaler M J Am Med Inform Assoc. 2024; 31(9):2040-2046.

PMID: 38917444 PMC: 11339513. DOI: 10.1093/jamia/ocae157.


References
1.
. Annual smoking-attributable mortality, years of potential life lost, and productivity losses--United States, 1997-2001. MMWR Morb Mortal Wkly Rep. 2005; 54(25):625-8. View

2.
Wang Y, Chen E, Pakhomov S, Arsoniadis E, Carter E, Lindemann E . Automated Extraction of Substance Use Information from Clinical Texts. AMIA Annu Symp Proc. 2016; 2015:2121-30. PMC: 4765598. View

3.
Blizinsky K, Bonham V . Leveraging the Learning Health Care Model to Improve Equity in the Age of Genomic Medicine. Learn Health Syst. 2018; 2(1). PMC: 5813818. DOI: 10.1002/lrh2.10046. View

4.
Hatef E, Rouhizadeh M, Tia I, Lasser E, Hill-Briggs F, Marsteller J . Assessing the Availability of Data on Social and Behavioral Determinants in Structured and Unstructured Electronic Health Records: A Retrospective Analysis of a Multilevel Health Care System. JMIR Med Inform. 2019; 7(3):e13802. PMC: 6696855. DOI: 10.2196/13802. View

5.
Lee J, Yoon W, Kim S, Kim D, Kim S, So C . BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2019; 36(4):1234-1240. PMC: 7703786. DOI: 10.1093/bioinformatics/btz682. View