» Articles » PMID: 34514351

Identification of Social Determinants of Health Using Multi-label Classification of Electronic Health Record Clinical Notes

Overview
Journal JAMIA Open
Date 2021 Sep 13
PMID 34514351
Citations 27
Authors
Affiliations
Soon will be listed here.
Abstract

Objectives: Social determinants of health (SDH), key contributors to health, are rarely systematically measured and collected in the electronic health record (EHR). We investigate how to leverage clinical notes using novel applications of multi-label learning (MLL) to classify SDH in mental health and substance use disorder patients who frequent the emergency department.

Methods And Materials: We labeled a gold-standard corpus of EHR clinical note sentences ( = 4063) with 6 identified SDH-related domains recommended by the Institute of Medicine for inclusion in the EHR. We then trained 5 classification models: linear-Support Vector Machine, K-Nearest Neighbors, Random Forest, XGBoost, and bidirectional Long Short-Term Memory (BI-LSTM). We adopted 5 common evaluation measures: accuracy, average precision-recall (AP), area under the curve receiver operating characteristic (AUC-ROC), Hamming loss, and log loss to compare the performance of different methods for MLL classification using the F1 score as the primary evaluation metric.

Results: Our results suggested that, overall, BI-LSTM outperformed the other classification models in terms of AUC-ROC (93.9), AP (0.76), and Hamming loss (0.12). The AUC-ROC values of MLL models of SDH related domains varied between (0.59-1.0). We found that 44.6% of our study population ( = 1119) had at least one positive documentation of SDH.

Discussion And Conclusion: The proposed approach of training an MLL model on an SDH rich data source can produce a high performing classifier using only unstructured clinical notes. We also provide evidence that model performance is associated with lexical diversity by health professionals and the auto-generation of clinical note sentences to document SDH.

Citing Articles

SBDH-Reader: an LLM-powered method for extracting social and behavioral determinants of health from medical notes.

Gu Z, He L, Naeem A, Chan P, Mohamed A, Khalil H medRxiv. 2025; .

PMID: 40034759 PMC: 11875322. DOI: 10.1101/2025.02.19.25322576.


Applications of Natural Language Processing and Large Language Models for Social Determinants of Health: Protocol for a Systematic Review.

Rajwal S, Zhang Z, Chen Y, Rogers H, Sarker A, Xiao Y JMIR Res Protoc. 2025; 14:e66094.

PMID: 39836952 PMC: 11795155. DOI: 10.2196/66094.


The ENACT network is acting on housing instability and the unhoused using the open health natural language processing toolkit.

Harris D, Fu S, Wen A, Corbeau A, Henderson D, Hilsman J J Clin Transl Sci. 2024; 8(1):e98.

PMID: 39655040 PMC: 11626605. DOI: 10.1017/cts.2024.543.


Big Data, Big Insights: Leveraging Data Analytics to Unravel Cardiovascular Exposome Complexities.

Ibrahim R, Pham H, Nasir K, Hahad O, Sabharwal A, Al-Kindi S Methodist Debakey Cardiovasc J. 2024; 20(5):111-123.

PMID: 39525379 PMC: 11546329. DOI: 10.14797/mdcvj.1467.


On the development and validation of large language model-based classifiers for identifying social determinants of health.

Gabriel R, Litake O, Simpson S, Burton B, Waterman R, Macias A Proc Natl Acad Sci U S A. 2024; 121(39):e2320716121.

PMID: 39284061 PMC: 11441499. DOI: 10.1073/pnas.2320716121.


References
1.
Zufferey D, Hofer T, Hennebert J, Schumacher M, Ingold R, Bromuri S . Performance comparison of multi-label learning algorithms on clinical data for chronic diseases. Comput Biol Med. 2015; 65:34-43. DOI: 10.1016/j.compbiomed.2015.07.017. View

2.
Fan Y, Pakhomov S, McEwan R, Zhao W, Lindemann E, Zhang R . Using word embeddings to expand terminology of dietary supplements on clinical notes. JAMIA Open. 2019; 2(2):246-253. PMC: 6904105. DOI: 10.1093/jamiaopen/ooz007. View

3.
Bejan C, Angiolillo J, Conway D, Nash R, Shirey-Rice J, Lipworth L . Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records. J Am Med Inform Assoc. 2017; 25(1):61-71. PMC: 6080810. DOI: 10.1093/jamia/ocx059. View

4.
Bettencourt-Silva J, Mulligan N, Sbodio M, Segrave-Daly J, Williams R, Lopez V . Discovering New Social Determinants of Health Concepts from Unstructured Data: Framework and Evaluation. Stud Health Technol Inform. 2020; 270:173-177. DOI: 10.3233/SHTI200145. View

5.
Lincoln J . Actual causes of death in the United States. JAMA. 1994; 271(9):660; author reply 660-1. DOI: 10.1001/jama.271.9.660c. View