Leveraging Large Language Models for Generating Responses to Patient Messages-a Subjective Analysis

Overview

Journal J Am Med Inform Assoc

Publisher Oxford University Press

Specialty Medical Informatics

Date 2024 Mar 18

PMID 38497958

Authors

Siru Liu

Allison B McCoy

Aileen P Wright

Babatunde Carew

Julian Z Genkins

Sean S Huang

Josh F Peterson

Bryan Steitz

Adam Wright

Affiliations

Soon will be listed here.

Abstract

Objective: This study aimed to develop and assess the performance of fine-tuned large language models for generating responses to patient messages sent via an electronic health record patient portal.

Materials And Methods: Utilizing a dataset of messages and responses extracted from the patient portal at a large academic medical center, we developed a model (CLAIR-Short) based on a pre-trained large language model (LLaMA-65B). In addition, we used the OpenAI API to update physician responses from an open-source dataset into a format with informative paragraphs that offered patient education while emphasizing empathy and professionalism. By combining with this dataset, we further fine-tuned our model (CLAIR-Long). To evaluate fine-tuned models, we used 10 representative patient portal questions in primary care to generate responses. We asked primary care physicians to review generated responses from our models and ChatGPT and rated them for empathy, responsiveness, accuracy, and usefulness.

Results: The dataset consisted of 499 794 pairs of patient messages and corresponding responses from the patient portal, with 5000 patient messages and ChatGPT-updated responses from an online platform. Four primary care physicians participated in the survey. CLAIR-Short exhibited the ability to generate concise responses similar to provider's responses. CLAIR-Long responses provided increased patient educational content compared to CLAIR-Short and were rated similarly to ChatGPT's responses, receiving positive evaluations for responsiveness, empathy, and accuracy, while receiving a neutral rating for usefulness.

Conclusion: This subjective analysis suggests that leveraging large language models to generate responses to patient messages demonstrates significant potential in facilitating communication between patients and healthcare providers.

Citing Articles

Ethics in Patient Preferences for Artificial Intelligence-Drafted Responses to Electronic Messages.

Cavalier J, Goldstein B, Ravitsky V, Belisle-Pipon J, Bedoya A, Maddocks J JAMA Netw Open. 2025; 8(3):e250449.

PMID: 40067301 PMC: 11897835. DOI: 10.1001/jamanetworkopen.2025.0449.

Generative artificial intelligence in graduate medical education.

Janumpally R, Nanua S, Ngo A, Youens K Front Med (Lausanne). 2025; 11:1525604.

PMID: 39867924 PMC: 11758457. DOI: 10.3389/fmed.2024.1525604.

The Goldilocks Zone: Finding the right balance of user and institutional risk for suicide-related generative AI queries.

VAN Meter A, Wheaton M, Cosgrove V, Andreadis K, Robertson R PLOS Digit Health. 2025; 4(1):e0000711.

PMID: 39774367 PMC: 11709298. DOI: 10.1371/journal.pdig.0000711.

Not the Models You Are Looking For: Traditional ML Outperforms LLMs in Clinical Prediction Tasks.

Brown K, Yan C, Li Z, Zhang X, Collins B, Chen Y medRxiv. 2024; .

PMID: 39677419 PMC: 11643212. DOI: 10.1101/2024.12.03.24318400.

Large Language Models and Empathy: Systematic Review.

Sorin V, Brin D, Barash Y, Konen E, Charney A, Nadkarni G J Med Internet Res. 2024; 26:e52597.

PMID: 39661968 PMC: 11669866. DOI: 10.2196/52597.

References

Liu J, Liu F, Fang J, Liu S . The application of Chat Generative Pre-trained Transformer in nursing education. Nurs Outlook. 2023; 71(6):102064. DOI: 10.1016/j.outlook.2023.102064. View

Kumah-Crystal Y, Mankowitz S, Embi P, Lehmann C . ChatGPT and the clinical informatics board examination: the end of unproctored maintenance of certification?. J Am Med Inform Assoc. 2023; 30(9):1558-1560. PMC: 10436139. DOI: 10.1093/jamia/ocad104. View

Arndt B, Beasley J, Watkinson M, Temte J, Tuan W, Sinsky C . Tethered to the EHR: Primary Care Physician Workload Assessment Using EHR Event Log Data and Time-Motion Observations. Ann Fam Med. 2017; 15(5):419-426. PMC: 5593724. DOI: 10.1370/afm.2121. View

Adler-Milstein J, Zhao W, Willard-Grace R, Knox M, Grumbach K . Electronic health records and burnout: Time spent on the electronic health record after hours and message volume associated with exhaustion but not with cynicism among primary care clinicians. J Am Med Inform Assoc. 2020; 27(4):531-538. PMC: 7647261. DOI: 10.1093/jamia/ocz220. View

Steitz B, Sulieman L, Wright A, Rosenbloom S . Association of Immediate Release of Test Results to Patients With Implications for Clinical Workflow. JAMA Netw Open. 2021; 4(10):e2129553. PMC: 8524306. DOI: 10.1001/jamanetworkopen.2021.29553. View

Tarver W, Menser T, Hesse B, Johnson T, Beckjord E, Ford E . Growth Dynamics of Patient-Provider Internet Communication: Trend Analysis Using the Health Information National Trends Survey (2003 to 2013). J Med Internet Res. 2018; 20(3):e109. PMC: 5897625. DOI: 10.2196/jmir.7851. View

Li Y, Li Z, Zhang K, Dan R, Jiang S, Zhang Y . ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge. Cureus. 2023; 15(6):e40895. PMC: 10364849. DOI: 10.7759/cureus.40895. View

Friedman C . A "fundamental theorem" of biomedical informatics. J Am Med Inform Assoc. 2008; 16(2):169-70. PMC: 2649317. DOI: 10.1197/jamia.M3092. View

Thirunavukarasu A, Ting D, Elangovan K, Gutierrez L, Tan T, Ting D . Large language models in medicine. Nat Med. 2023; 29(8):1930-1940. DOI: 10.1038/s41591-023-02448-8. View

10.

Harris P, Taylor R, Thielke R, Payne J, Gonzalez N, Conde J . Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2008; 42(2):377-81. PMC: 2700030. DOI: 10.1016/j.jbi.2008.08.010. View

11.

Holmgren A, Downing N, Tang M, Sharp C, Longhurst C, Huckman R . Assessing the impact of the COVID-19 pandemic on clinician ambulatory electronic health record use. J Am Med Inform Assoc. 2021; 29(3):453-460. PMC: 8689796. DOI: 10.1093/jamia/ocab268. View

12.

Koo T, Li M . A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med. 2016; 15(2):155-63. PMC: 4913118. DOI: 10.1016/j.jcm.2016.02.012. View

13.

Wang C, Liu S, Yang H, Guo J, Wu Y, Liu J . Ethical Considerations of Using ChatGPT in Health Care. J Med Internet Res. 2023; 25:e48009. PMC: 10457697. DOI: 10.2196/48009. View

14.

Chambon P, Wu C, Steinkamp J, Adleberg J, Cook T, Langlotz C . Automated deidentification of radiology reports combining transformer and "hide in plain sight" rule-based methods. J Am Med Inform Assoc. 2022; 30(2):318-328. PMC: 9846681. DOI: 10.1093/jamia/ocac219. View

15.

Ayers J, Poliak A, Dredze M, Leas E, Zhu Z, Kelley J . Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Intern Med. 2023; 183(6):589-596. PMC: 10148230. DOI: 10.1001/jamainternmed.2023.1838. View

16.

Sinsky C, Shanafelt T, Ripp J . The Electronic Health Record Inbox: Recommendations for Relief. J Gen Intern Med. 2022; 37(15):4002-4003. PMC: 9640509. DOI: 10.1007/s11606-022-07766-0. View

17.

Cascella M, Montomoli J, Bellini V, Bignami E . Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios. J Med Syst. 2023; 47(1):33. PMC: 9985086. DOI: 10.1007/s10916-023-01925-4. View

18.

Akbar F, Mark G, Warton E, Reed M, Prausnitz S, East J . Physicians' electronic inbox work patterns and factors associated with high inbox work duration. J Am Med Inform Assoc. 2020; 28(5):923-930. PMC: 8068414. DOI: 10.1093/jamia/ocaa229. View

19.

Kung T, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepano C . Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023; 2(2):e0000198. PMC: 9931230. DOI: 10.1371/journal.pdig.0000198. View

20.

Sorace J, Wong H, DeLeire T, Xu D, Handler S, Garcia B . Quantifying the competitiveness of the electronic health record market and its implications for interoperability. Int J Med Inform. 2020; 136:104037. DOI: 10.1016/j.ijmedinf.2019.104037. View