» Articles » PMID: 37389908

Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument

Overview
Publisher JMIR Publications
Date 2023 Jun 30
PMID 37389908
Authors
Affiliations
Soon will be listed here.
Abstract

Background: ChatGPT-4 is the latest release of a novel artificial intelligence (AI) chatbot able to answer freely formulated and complex questions. In the near future, ChatGPT could become the new standard for health care professionals and patients to access medical information. However, little is known about the quality of medical information provided by the AI.

Objective: We aimed to assess the reliability of medical information provided by ChatGPT.

Methods: Medical information provided by ChatGPT-4 on the 5 hepato-pancreatico-biliary (HPB) conditions with the highest global disease burden was measured with the Ensuring Quality Information for Patients (EQIP) tool. The EQIP tool is used to measure the quality of internet-available information and consists of 36 items that are divided into 3 subsections. In addition, 5 guideline recommendations per analyzed condition were rephrased as questions and input to ChatGPT, and agreement between the guidelines and the AI answer was measured by 2 authors independently. All queries were repeated 3 times to measure the internal consistency of ChatGPT.

Results: Five conditions were identified (gallstone disease, pancreatitis, liver cirrhosis, pancreatic cancer, and hepatocellular carcinoma). The median EQIP score across all conditions was 16 (IQR 14.5-18) for the total of 36 items. Divided by subsection, median scores for content, identification, and structure data were 10 (IQR 9.5-12.5), 1 (IQR 1-1), and 4 (IQR 4-5), respectively. Agreement between guideline recommendations and answers provided by ChatGPT was 60% (15/25). Interrater agreement as measured by the Fleiss κ was 0.78 (P<.001), indicating substantial agreement. Internal consistency of the answers provided by ChatGPT was 100%.

Conclusions: ChatGPT provides medical information of comparable quality to available static internet information. Although currently of limited quality, large language models could become the future standard for patients and health care professionals to gather medical information.

Citing Articles

Revolutionizing MASLD: How Artificial Intelligence Is Shaping the Future of Liver Care.

Pugliese N, Bertazzoni A, Hassan C, Schattenberg J, Aghemo A Cancers (Basel). 2025; 17(5).

PMID: 40075570 PMC: 11899536. DOI: 10.3390/cancers17050722.


Artificial intelligence in healthcare education: evaluating the accuracy of ChatGPT, Copilot, and Google Gemini in cardiovascular pharmacology.

Salman I, Ameer O, Khanfar M, Hsieh Y Front Med (Lausanne). 2025; 12:1495378.

PMID: 40046930 PMC: 11879995. DOI: 10.3389/fmed.2025.1495378.


Comparing the performance of a large language model and naive human interviewers in interviewing children about a witnessed mock-event.

Sun Y, Pang H, Jarvilehto L, Zhang O, Shapiro D, Korkman J PLoS One. 2025; 20(2):e0316317.

PMID: 40019879 PMC: 11870376. DOI: 10.1371/journal.pone.0316317.


Artificial Intelligence (AI) - Powered Documentation Systems in Healthcare: A Systematic Review.

Bracken A, Reilly C, Feeley A, Sheehan E, Merghani K, Feeley I J Med Syst. 2025; 49(1):28.

PMID: 39966286 PMC: 11835907. DOI: 10.1007/s10916-025-02157-4.


Exploring the Utility of ChatGPT in Cleft Lip Repair Education.

Mahedia M, Rohrich R, Sadiq K, Bailey L, Harrison L, Hallac R J Clin Med. 2025; 14(3).

PMID: 39941663 PMC: 11818196. DOI: 10.3390/jcm14030993.


References
1.
Stevens L, Guo M, Brown Z, Ejaz A, Pawlik T, Cloyd J . Evaluating the Quality of Online Information Regarding Neoadjuvant Therapy for Pancreatic Cancer. J Gastrointest Cancer. 2022; 54(3):890-896. DOI: 10.1007/s12029-022-00879-z. View

2.
Vetter D, Ruhwinkel H, Raptis D, Bueter M . Quality Assessment of Information on Bariatric Surgery Websites. Obes Surg. 2017; 28(5):1240-1247. DOI: 10.1007/s11695-017-2983-0. View

3.
Salvagno M, Taccone F, Gerli A . Can artificial intelligence help for scientific writing?. Crit Care. 2023; 27(1):75. PMC: 9960412. DOI: 10.1186/s13054-023-04380-2. View

4.
Tsytsarev V . Methodological aspects of studying the mechanisms of consciousness. Behav Brain Res. 2021; 419:113684. DOI: 10.1016/j.bbr.2021.113684. View

5.
Liebrenz M, Schleifer R, Buadze A, Bhugra D, Smith A . Generating scholarly content with ChatGPT: ethical challenges for medical publishing. Lancet Digit Health. 2023; 5(3):e105-e106. DOI: 10.1016/S2589-7500(23)00019-5. View