Leveraging Large Language Models for Generating Responses to Patient Messages-a Subjective Analysis
Overview
Authors
Affiliations
Objective: This study aimed to develop and assess the performance of fine-tuned large language models for generating responses to patient messages sent via an electronic health record patient portal.
Materials And Methods: Utilizing a dataset of messages and responses extracted from the patient portal at a large academic medical center, we developed a model (CLAIR-Short) based on a pre-trained large language model (LLaMA-65B). In addition, we used the OpenAI API to update physician responses from an open-source dataset into a format with informative paragraphs that offered patient education while emphasizing empathy and professionalism. By combining with this dataset, we further fine-tuned our model (CLAIR-Long). To evaluate fine-tuned models, we used 10 representative patient portal questions in primary care to generate responses. We asked primary care physicians to review generated responses from our models and ChatGPT and rated them for empathy, responsiveness, accuracy, and usefulness.
Results: The dataset consisted of 499 794 pairs of patient messages and corresponding responses from the patient portal, with 5000 patient messages and ChatGPT-updated responses from an online platform. Four primary care physicians participated in the survey. CLAIR-Short exhibited the ability to generate concise responses similar to provider's responses. CLAIR-Long responses provided increased patient educational content compared to CLAIR-Short and were rated similarly to ChatGPT's responses, receiving positive evaluations for responsiveness, empathy, and accuracy, while receiving a neutral rating for usefulness.
Conclusion: This subjective analysis suggests that leveraging large language models to generate responses to patient messages demonstrates significant potential in facilitating communication between patients and healthcare providers.
Ethics in Patient Preferences for Artificial Intelligence-Drafted Responses to Electronic Messages.
Cavalier J, Goldstein B, Ravitsky V, Belisle-Pipon J, Bedoya A, Maddocks J JAMA Netw Open. 2025; 8(3):e250449.
PMID: 40067301 PMC: 11897835. DOI: 10.1001/jamanetworkopen.2025.0449.
Generative artificial intelligence in graduate medical education.
Janumpally R, Nanua S, Ngo A, Youens K Front Med (Lausanne). 2025; 11:1525604.
PMID: 39867924 PMC: 11758457. DOI: 10.3389/fmed.2024.1525604.
VAN Meter A, Wheaton M, Cosgrove V, Andreadis K, Robertson R PLOS Digit Health. 2025; 4(1):e0000711.
PMID: 39774367 PMC: 11709298. DOI: 10.1371/journal.pdig.0000711.
Not the Models You Are Looking For: Traditional ML Outperforms LLMs in Clinical Prediction Tasks.
Brown K, Yan C, Li Z, Zhang X, Collins B, Chen Y medRxiv. 2024; .
PMID: 39677419 PMC: 11643212. DOI: 10.1101/2024.12.03.24318400.
Large Language Models and Empathy: Systematic Review.
Sorin V, Brin D, Barash Y, Konen E, Charney A, Nadkarni G J Med Internet Res. 2024; 26:e52597.
PMID: 39661968 PMC: 11669866. DOI: 10.2196/52597.