» Articles » PMID: 38784703

Use of Large Language Model-based Chatbots in Managing the Rehabilitation Concerns and Education Needs of Outpatient Stroke Survivors and Caregivers

Overview
Date 2024 May 24
PMID 38784703
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The utility of large language model-based (LLM) artificial intelligence (AI) chatbots in many aspects of healthcare is becoming apparent though their ability to address patient concerns remains unknown. We sought to evaluate the performance of two well-known, freely-accessible chatbots, ChatGPT and Google Bard, in responding to common questions about stroke rehabilitation posed by patients and their caregivers.

Methods: We collected questions from outpatients and their caregivers through a survey, categorised them by theme, and created representative questions to be posed to both chatbots. We then evaluated the chatbots' responses based on accuracy, safety, relevance, and readability. Interrater agreement was also tracked.

Results: Although both chatbots achieved similar overall scores, Google Bard performed slightly better in relevance and safety. Both provided readable responses with some general accuracy, but struggled with hallucinated responses, were often not specific, and lacked awareness of the possibility for emotional situations with the potential to turn dangerous. Additionally, interrater agreement was low, highlighting the variability in physician acceptance of their responses.

Conclusions: AI chatbots show potential in patient-facing support roles, but issues remain regarding safety, accuracy, and relevance. Future chatbots should address these problems to ensure that they can reliably and independently manage the concerns and questions of stroke patients and their caregivers.

Citing Articles

Can Large Language Models Aid Caregivers of Pediatric Cancer Patients in Information Seeking? A Cross-Sectional Investigation.

Sezgin E, Jackson D, Kocaballi A, Bibart M, Zupanec S, Landier W Cancer Med. 2025; 14(1):e70554.

PMID: 39776222 PMC: 11705392. DOI: 10.1002/cam4.70554.


A Performance Evaluation of Large Language Models in Keratoconus: A Comparative Study of ChatGPT-3.5, ChatGPT-4.0, Gemini, Copilot, Chatsonic, and Perplexity.

Reyhan A, Mutaf C, Uzun I, Yuksekyayla F J Clin Med. 2024; 13(21).

PMID: 39518652 PMC: 11547000. DOI: 10.3390/jcm13216512.

References
1.
Hanger H, Mulley G . Questions people ask about stroke. Stroke. 1993; 24(4):536-8. DOI: 10.1161/01.str.24.4.536. View

2.
Zakka C, Shad R, Chaurasia A, Dalal A, Kim J, Moor M . Almanac - Retrieval-Augmented Language Models for Clinical Medicine. NEJM AI. 2024; 1(2). PMC: 10857783. DOI: 10.1056/aioa2300068. View

3.
Sung J . Artificial intelligence in medicine: Ethical, social and legal perspectives. Ann Acad Med Singap. 2024; 52(12):695-699. DOI: 10.47102/annals-acadmedsg.2023272. View

4.
Pradhan F, Fiedler A, Samson K, Olivera-Martinez M, Manatsathit W, Peeraphatdit T . Artificial intelligence compared with human-derived patient educational materials on cirrhosis. Hepatol Commun. 2024; 8(3). PMC: 10871753. DOI: 10.1097/HC9.0000000000000367. View

5.
Gianola S, Bargeri S, Castellini G, Cook C, Palese A, Pillastrini P . Performance of ChatGPT Compared to Clinical Practice Guidelines in Making Informed Decisions for Lumbosacral Radicular Pain: A Cross-sectional Study. J Orthop Sports Phys Ther. 2024; 54(3):222-228. DOI: 10.2519/jospt.2024.12151. View