» Articles » PMID: 39644377

The Role of Large Language Models in Self-care: a Study and Benchmark on Medicines and Supplement Guidance Accuracy

Overview
Publisher Springer
Specialties Pharmacology
Pharmacy
Date 2024 Dec 7
PMID 39644377
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The recent surge in the capabilities of artificial intelligence systems, particularly large language models, is also impacting the medical and pharmaceutical field in a major way. Beyond specialized uses in diagnostics and data discovery, these tools have now become accessible to the general public.

Aim: The study aimed to critically analyse the current performance of large language models in answering patient's self-care questions regarding medications and supplements.

Method: Answers from six major language models were analysed for correctness, language-independence, context-sensitivity, and reproducibility using a newly developed reference set of questions and a scoring matrix.

Results: The investigated large language models are capable of answering a clear majority of self-care questions accurately, providing relevant health information. However, substantial variability in the responses, including potentially unsafe advice, was observed, influenced by language, question structure, user context and time. GPT 4.0 scored highest on average, while GPT 3.5, Gemini, and Gemini Advanced had varied scores. Responses were context and language sensitive. In terms of consistency over time, Perplexity had the worst performance.

Conclusion: Given the high-quality output of large language models, their potential in self-care applications is undeniable. The newly created benchmark can facilitate further validation and guide the establishment of strict safeguards to combat the sizable risk of misinformation in order to reach a more favourable risk/benefit ratio when this cutting-edge technology is used by patients.

Citing Articles

Application of Artificial Intelligence Generated Content in Medical Examinations.

Li R, Wu T Adv Med Educ Pract. 2025; 16:331-339.

PMID: 40026780 PMC: 11871906. DOI: 10.2147/AMEP.S492895.

References
1.
Thirunavukarasu A, Ting D, Elangovan K, Gutierrez L, Tan T, Ting D . Large language models in medicine. Nat Med. 2023; 29(8):1930-1940. DOI: 10.1038/s41591-023-02448-8. View

2.
Sallam M . ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare (Basel). 2023; 11(6). PMC: 10048148. DOI: 10.3390/healthcare11060887. View

3.
Kung T, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepano C . Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023; 2(2):e0000198. PMC: 9931230. DOI: 10.1371/journal.pdig.0000198. View

4.
Yu K, Beam A, Kohane I . Artificial intelligence in healthcare. Nat Biomed Eng. 2019; 2(10):719-731. DOI: 10.1038/s41551-018-0305-z. View

5.
Haug C, Drazen J . Artificial Intelligence and Machine Learning in Clinical Medicine, 2023. N Engl J Med. 2023; 388(13):1201-1208. DOI: 10.1056/NEJMra2302038. View