ChatGPT Earns American Board Certification in Hand Surgery

Overview

Journal Hand Surg Rehabil

Date 2024 Mar 29

PMID 38552842

Authors

Diane Ghanem

Joseph E Nassar

Joseph El Bachour

Tammam Hanna

Affiliations

Soon will be listed here.

Abstract

Purpose: Artificial Intelligence (AI), and specifically ChatGPT, has shown potential in healthcare, yet its performance in specialized medical examinations such as the Orthopaedic Surgery In-Training Examination and European Board Hand Surgery diploma has been inconsistent. This study aims to evaluate the capability of ChatGPT-4 to pass the American Hand Surgery Certifying Examination.

Methods: ChatGPT-4 was tested on the 2019 American Society for Surgery of the Hand (ASSH) Self-Assessment Exam. All 200 questions available online (https://onlinecme.assh.org) were retrieved. All media-containing questions were flagged and carefully reviewed. Eight media-containing questions were excluded as they either relied purely on videos or could not be rationalized from the presented information. Descriptive statistics were used to summarize the performance (% correct) of ChatGPT-4. The ASSH report was used to compare ChatGPT-4's performance to that of the 322 physicians who completed the 2019 ASSH self-assessment.

Results: ChatGPT-4 answered 192 questions with an overall score of 61.98%. Performance on media-containing questions was 55.56%, while on non-media questions it was 65.83%, with no statistical difference in performance based on media inclusion. Despite scoring below the average physician's performance, ChatGPT-4 outperformed in the 'vascular' section with 81.82%. Its performance was lower in the 'bone and joint' (48.54%) and 'neuromuscular' (56.25%) sections.

Conclusions: ChatGPT-4 achieved a good overall score of 61.98%. This AI language model demonstrates significant capability in processing and answering specialized medical examination questions, albeit with room for improvement in areas requiring complex clinical judgment and nuanced interpretation. ChatGPT-4's proficiency is influenced by the structure and language of the examination, with no replacement for the depth of trained medical specialists. This study underscores the supportive role of AI in medical education and clinical decision-making while highlighting the current limitations in nuanced fields such as hand surgery.

Citing Articles

Evaluation of Chat Generative Pre-trained Transformer and Microsoft Copilot Performance on the American Society of Surgery of the Hand Self-Assessment Examinations.

Rakauskas T, Da Costa A, Moriconi C, Gill G, Kwong J, Lee N J Hand Surg Glob Online. 2025; 7(1):23-28.

PMID: 39991611 PMC: 11846544. DOI: 10.1016/j.jhsg.2024.10.001.

Qualitative metrics from the biomedical literature for evaluating large language models in clinical decision-making: a narrative review.

Ho C, Tian T, Ayers A, Aaron R, Phillips V, Wolf R BMC Med Inform Decis Mak. 2024; 24(1):357.

PMID: 39593074 PMC: 11590327. DOI: 10.1186/s12911-024-02757-z.

Integrating artificial intelligence in orthopaedic care and surgery: the revolutionary role of ChatGPT, as written with ChatGPT.

Ghanem D Int J Surg. 2024; 110(12):7593-7597.

PMID: 39453839 PMC: 11634199. DOI: 10.1097/JS9.0000000000002130.

Assessing ChatGPT's summarization of Ga PSMA PET/CT reports for patients.

Bulbul O, Bulbul H, Kaba E Abdom Radiol (NY). 2024; 50(3):1467-1474.

PMID: 39347975 DOI: 10.1007/s00261-024-04619-8.

The Performance of ChatGPT on the American Society for Surgery of the Hand Self-Assessment Examination.

Arango S, Flynn J, Zeitlin J, Lorenzana D, Miller A, Wilson M Cureus. 2024; 16(4):e58950.

PMID: 38800302 PMC: 11126365. DOI: 10.7759/cureus.58950.