» Articles » PMID: 38117307

Performance of Artificial Intelligence Chatbots in Sleep Medicine Certification Board Exams: ChatGPT Versus Google Bard

Overview
Date 2023 Dec 20
PMID 38117307
Authors
Affiliations
Soon will be listed here.
Abstract

Purpose: To conduct a comparative performance evaluation of GPT-3.5, GPT-4 and Google Bard in self-assessment questions at the level of the American Sleep Medicine Certification Board Exam.

Methods: A total of 301 text-based single-best-answer multiple choice questions with four answer options each, across 10 categories, were included in the study and transcribed as inputs for GPT-3.5, GPT-4 and Google Bard. The first output responses generated were selected and matched for answer accuracy against the gold-standard answer provided by the American Academy of Sleep Medicine for each question. A global score of 80% and above is required by human sleep medicine specialists to pass each exam category.

Results: GPT-4 successfully achieved the pass mark of 80% or above in five of the 10 exam categories, including the Normal Sleep and Variants Self-Assessment Exam (2021), Circadian Rhythm Sleep-Wake Disorders Self-Assessment Exam (2021), Insomnia Self-Assessment Exam (2022), Parasomnias Self-Assessment Exam (2022) and the Sleep-Related Movements Self-Assessment Exam (2023). GPT-4 demonstrated superior performance in all exam categories and achieved a higher overall score of 68.1% when compared against both GPT-3.5 (46.8%) and Google Bard (45.5%), which was statistically significant (p value < 0.001). There was no significant difference in the overall score performance between GPT-3.5 and Google Bard.

Conclusions: Otolaryngologists and sleep medicine physicians have a crucial role through agile and robust research to ensure the next generation AI chatbots are built safely and responsibly.

Citing Articles

ChatGPT-4 Omni's superiority in answering multiple-choice oral radiology questions.

Tassoker M BMC Oral Health. 2025; 25(1):173.

PMID: 39893407 PMC: 11786404. DOI: 10.1186/s12903-025-05554-w.


Generative artificial intelligence in graduate medical education.

Janumpally R, Nanua S, Ngo A, Youens K Front Med (Lausanne). 2025; 11:1525604.

PMID: 39867924 PMC: 11758457. DOI: 10.3389/fmed.2024.1525604.


Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis.

Zong H, Wu R, Cha J, Wang J, Wu E, Li J J Med Internet Res. 2024; 26():e66114.

PMID: 39729356 PMC: 11724220. DOI: 10.2196/66114.


Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study.

Chen Y, Huang X, Yang F, Lin H, Lin H, Zheng Z BMC Med Educ. 2024; 24(1):1372.

PMID: 39593041 PMC: 11590336. DOI: 10.1186/s12909-024-06309-x.


Performance of large language artificial intelligence models on solving restorative dentistry and endodontics student assessments.

Kunzle P, Paris S Clin Oral Investig. 2024; 28(11):575.

PMID: 39373739 PMC: 11458639. DOI: 10.1007/s00784-024-05968-w.


References
2.
Quan S, Buysse D, Davidson Ward S, Harding S, Iber C, Kapur V . Development and growth of a large multispecialty certification examination: sleep medicine certification--results of the first three examinations. J Clin Sleep Med. 2012; 8(2):221-4. PMC: 3311423. DOI: 10.5664/jcsm.1790. View

3.
Benjafield A, Ayas N, Eastwood P, Heinzer R, Ip M, Morrell M . Estimation of the global prevalence and burden of obstructive sleep apnoea: a literature-based analysis. Lancet Respir Med. 2019; 7(8):687-698. PMC: 7007763. DOI: 10.1016/S2213-2600(19)30198-5. View

4.
Marin J, Carrizo S, Vicente E, Agusti A . Long-term cardiovascular outcomes in men with obstructive sleep apnoea-hypopnoea with or without treatment with continuous positive airway pressure: an observational study. Lancet. 2005; 365(9464):1046-53. DOI: 10.1016/S0140-6736(05)71141-7. View

5.
Lloyd-Jones D, Allen N, Anderson C, Black T, Brewer L, Foraker R . Life's Essential 8: Updating and Enhancing the American Heart Association's Construct of Cardiovascular Health: A Presidential Advisory From the American Heart Association. Circulation. 2022; 146(5):e18-e43. PMC: 10503546. DOI: 10.1161/CIR.0000000000001078. View

6.
Yu P, Gadkaree S, Li J, McCarty J, Huyett P, Bergmark R . Characteristics of the Dual Board-Certified Sleep Otolaryngology Workforce. Laryngoscope. 2021; 131(10):E2712-E2717. DOI: 10.1002/lary.29725. View