» Articles » PMID: 38206515

Repeatability, Reproducibility, and Diagnostic Accuracy of a Commercial Large Language Model (ChatGPT) to Perform Emergency Department Triage Using the Canadian Triage and Acuity Scale

Overview
Journal CJEM
Publisher Springer
Specialty Emergency Medicine
Date 2024 Jan 11
PMID 38206515
Authors
Affiliations
Soon will be listed here.
Abstract

Purpose: The release of the ChatGPT prototype to the public in November 2022 drastically reduced the barrier to using artificial intelligence by allowing easy access to a large language model with only a simple web interface. One situation where ChatGPT could be useful is in triaging patients arriving to the emergency department. This study aimed to address the research problem: "can emergency physicians use ChatGPT to accurately triage patients using the Canadian Triage and Acuity Scale (CTAS)?".

Methods: Six unique prompts were developed independently by five emergency physicians. An automated script was used to query ChatGPT with each of the 6 prompts combined with 61 validated and previously published patient vignettes. Thirty repetitions of each combination were performed for a total of 10,980 simulated triages.

Results: In 99.6% of 10,980 queries, a CTAS score was returned. However, there was considerable variations in results. Repeatability (use of the same prompt repeatedly) was responsible for 21.0% of overall variation. Reproducibility (use of different prompts) was responsible for 4.0% of overall variation. Overall accuracy of ChatGPT to triage simulated patients was 47.5% with a 13.7% under-triage rate and a 38.7% over-triage rate. More extensively detailed text given as a prompt was associated with greater reproducibility, but minimal increase in accuracy.

Conclusions: This study suggests that the current ChatGPT large language model is not sufficient for emergency physicians to triage simulated patients using the Canadian Triage and Acuity Scale due to poor repeatability and accuracy. Medical practitioners should be aware that while ChatGPT can be a valuable tool, it may lack consistency and may frequently provide false information.

Citing Articles

Medical validity and layperson interpretation of emergency visit recommendations by the GPT model: A cross-sectional study.

Tanaka C, Kinoshita T, Okada Y, Satoh K, Homma Y, Suzuki K Acute Med Surg. 2025; 12(1):e70042.

PMID: 40078650 PMC: 11897724. DOI: 10.1002/ams2.70042.


Evaluating AI performance in nephrology triage and subspecialty referrals.

Koirala P, Thongprayoon C, Miao J, Garcia Valencia O, Sheikh M, Suppadungsuk S Sci Rep. 2025; 15(1):3455.

PMID: 39870788 PMC: 11772766. DOI: 10.1038/s41598-025-88074-5.


Leveraging Large Language Models for Improved Understanding of Communications With Patients With Cancer in a Call Center Setting: Proof-of-Concept Study.

Cho S, Lee M, Yu J, Yoon J, Choi J, Jung K J Med Internet Res. 2024; 26:e63892.

PMID: 39661975 PMC: 11669882. DOI: 10.2196/63892.


Concordance between humans and GPT-4 in appraising the methodological quality of case reports and case series using the Murad tool.

Tarakji Z, Kanaan A, Saadi S, Firwana M, Kabbara Allababidi A, Abusalih M BMC Med Res Methodol. 2024; 24(1):266.

PMID: 39497032 PMC: 11533388. DOI: 10.1186/s12874-024-02372-6.


Reply to Zhang et al. regarding 'Re: ChatGPT encounters multiple opportunities and challenges in neurosurgery'.

Yang M, Zheng B, Song J Int J Surg. 2024; 111(1):1680-1681.

PMID: 39377423 PMC: 11745614. DOI: 10.1097/JS9.0000000000002106.


References
1.
Alkaissi H, McFarlane S . Artificial Hallucinations in ChatGPT: Implications in Scientific Writing. Cureus. 2023; 15(2):e35179. PMC: 9939079. DOI: 10.7759/cureus.35179. View

2.
Dave T, Athaluri S, Singh S . ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. 2023; 6:1169595. PMC: 10192861. DOI: 10.3389/frai.2023.1169595. View

3.
Dong S, Bullard M, Meurer D, Blitz S, Ohinmaa A, Holroyd B . Reliability of computerized emergency triage. Acad Emerg Med. 2006; 13(3):269-75. DOI: 10.1197/j.aem.2005.10.014. View

4.
McLeod S, McCarron J, Ahmed T, Grewal K, Mittmann N, Scott S . Interrater Reliability, Accuracy, and Triage Time Pre- and Post-implementation of a Real-Time Electronic Triage Decision-Support Tool. Ann Emerg Med. 2019; 75(4):524-531. DOI: 10.1016/j.annemergmed.2019.07.048. View

5.
Tam H, Chung S, Lou C . A review of triage accuracy and future direction. BMC Emerg Med. 2018; 18(1):58. PMC: 6302512. DOI: 10.1186/s12873-018-0215-0. View