Performance of Artificial Intelligence Chatbots in Sleep Medicine Certification Board Exams: ChatGPT Versus Google Bard

Overview

Journal Eur Arch Otorhinolaryngol

Specialty Otorhinolaryngology

Date 2023 Dec 20

PMID 38117307

Authors

Ryan Chin Taw Cheong

Kenny Peter Pang

Samit Unadkat

Venkata Mcneillis

Andrew Williamson

Jonathan Joseph

Premjit Randhawa

Peter Andrews

Vinidh Paleri

Affiliations

Soon will be listed here.

Abstract

Purpose: To conduct a comparative performance evaluation of GPT-3.5, GPT-4 and Google Bard in self-assessment questions at the level of the American Sleep Medicine Certification Board Exam.

Methods: A total of 301 text-based single-best-answer multiple choice questions with four answer options each, across 10 categories, were included in the study and transcribed as inputs for GPT-3.5, GPT-4 and Google Bard. The first output responses generated were selected and matched for answer accuracy against the gold-standard answer provided by the American Academy of Sleep Medicine for each question. A global score of 80% and above is required by human sleep medicine specialists to pass each exam category.

Results: GPT-4 successfully achieved the pass mark of 80% or above in five of the 10 exam categories, including the Normal Sleep and Variants Self-Assessment Exam (2021), Circadian Rhythm Sleep-Wake Disorders Self-Assessment Exam (2021), Insomnia Self-Assessment Exam (2022), Parasomnias Self-Assessment Exam (2022) and the Sleep-Related Movements Self-Assessment Exam (2023). GPT-4 demonstrated superior performance in all exam categories and achieved a higher overall score of 68.1% when compared against both GPT-3.5 (46.8%) and Google Bard (45.5%), which was statistically significant (p value < 0.001). There was no significant difference in the overall score performance between GPT-3.5 and Google Bard.

Conclusions: Otolaryngologists and sleep medicine physicians have a crucial role through agile and robust research to ensure the next generation AI chatbots are built safely and responsibly.

Citing Articles

ChatGPT-4 Omni's superiority in answering multiple-choice oral radiology questions.

Tassoker M BMC Oral Health. 2025; 25(1):173.

PMID: 39893407 PMC: 11786404. DOI: 10.1186/s12903-025-05554-w.

Generative artificial intelligence in graduate medical education.

Janumpally R, Nanua S, Ngo A, Youens K Front Med (Lausanne). 2025; 11:1525604.

PMID: 39867924 PMC: 11758457. DOI: 10.3389/fmed.2024.1525604.

Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis.

Zong H, Wu R, Cha J, Wang J, Wu E, Li J J Med Internet Res. 2024; 26():e66114.

PMID: 39729356 PMC: 11724220. DOI: 10.2196/66114.

Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study.

Chen Y, Huang X, Yang F, Lin H, Lin H, Zheng Z BMC Med Educ. 2024; 24(1):1372.

PMID: 39593041 PMC: 11590336. DOI: 10.1186/s12909-024-06309-x.

Performance of large language artificial intelligence models on solving restorative dentistry and endodontics student assessments.

Kunzle P, Paris S Clin Oral Investig. 2024; 28(11):575.

PMID: 39373739 PMC: 11458639. DOI: 10.1007/s00784-024-05968-w.

References

Quan S, Buysse D, Davidson Ward S, Harding S, Iber C, Kapur V . Development and growth of a large multispecialty certification examination: sleep medicine certification--results of the first three examinations. J Clin Sleep Med. 2012; 8(2):221-4. PMC: 3311423. DOI: 10.5664/jcsm.1790. View

Benjafield A, Ayas N, Eastwood P, Heinzer R, Ip M, Morrell M . Estimation of the global prevalence and burden of obstructive sleep apnoea: a literature-based analysis. Lancet Respir Med. 2019; 7(8):687-698. PMC: 7007763. DOI: 10.1016/S2213-2600(19)30198-5. View

Marin J, Carrizo S, Vicente E, Agusti A . Long-term cardiovascular outcomes in men with obstructive sleep apnoea-hypopnoea with or without treatment with continuous positive airway pressure: an observational study. Lancet. 2005; 365(9464):1046-53. DOI: 10.1016/S0140-6736(05)71141-7. View

Lloyd-Jones D, Allen N, Anderson C, Black T, Brewer L, Foraker R . Life's Essential 8: Updating and Enhancing the American Heart Association's Construct of Cardiovascular Health: A Presidential Advisory From the American Heart Association. Circulation. 2022; 146(5):e18-e43. PMC: 10503546. DOI: 10.1161/CIR.0000000000001078. View

Yu P, Gadkaree S, Li J, McCarty J, Huyett P, Bergmark R . Characteristics of the Dual Board-Certified Sleep Otolaryngology Workforce. Laryngoscope. 2021; 131(10):E2712-E2717. DOI: 10.1002/lary.29725. View