» Articles » PMID: 36812645

Performance of ChatGPT on USMLE: Potential for AI-assisted Medical Education Using Large Language Models

Overview
Date 2023 Feb 22
PMID 36812645
Authors
Affiliations
Soon will be listed here.
Abstract

We evaluated the performance of a large language model called ChatGPT on the United States Medical Licensing Exam (USMLE), which consists of three exams: Step 1, Step 2CK, and Step 3. ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations. These results suggest that large language models may have the potential to assist with medical education, and potentially, clinical decision-making.

Citing Articles

Comparing ChatGPT 4.0's Performance in Interpreting Thyroid Nodule Ultrasound Reports Using ACR-TI-RADS 2017: Analysis Across Different Levels of Ultrasound User Experience.

Wakonig K, Barisch S, Kozarzewski L, Dommerich S, Lerchbaumer M Diagnostics (Basel). 2025; 15(5).

PMID: 40075883 PMC: 11899695. DOI: 10.3390/diagnostics15050635.


Investigating the Accuracy and Consistency of ChatGPT in the Management of Achilles Tendon Ruptures.

Knee C, Campbell R, Sivakumar B, Wines A, Symes M Cureus. 2025; 17(2):e78433.

PMID: 40046346 PMC: 11882158. DOI: 10.7759/cureus.78433.


ChatGPT's Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini.

Prazeres F JMIR Med Educ. 2025; 11:e65108.

PMID: 40043219 PMC: 11902880. DOI: 10.2196/65108.


GPT-4 generated answer rationales to multiple choice assessment questions in undergraduate medical education.

Chen P, Day W, Pekson R, Barrientos J, Burton W, Ludwig A BMC Med Educ. 2025; 25(1):333.

PMID: 40038669 PMC: 11877964. DOI: 10.1186/s12909-025-06862-z.


Evaluating base and retrieval augmented LLMs with document or online support for evidence based neurology.

Masanneck L, Meuth S, Pawlitzki M NPJ Digit Med. 2025; 8(1):137.

PMID: 40038423 PMC: 11880332. DOI: 10.1038/s41746-025-01536-y.


References
1.
Delahanty R, Kaufman D, Jones S . Development and Evaluation of an Automated Machine Learning Algorithm for In-Hospital Mortality Risk Adjustment Among Critical Care Patients. Crit Care Med. 2018; 46(6):e481-e488. DOI: 10.1097/CCM.0000000000003011. View

2.
Prasad V, Vandross A, Toomey C, Cheung M, Rho J, Quinn S . A decade of reversal: an analysis of 146 contradicted medical practices. Mayo Clin Proc. 2013; 88(8):790-8. DOI: 10.1016/j.mayocp.2013.05.012. View

3.
McDermott M, Wang S, Marinsek N, Ranganath R, Foschini L, Ghassemi M . Reproducibility in machine learning for health research: Still a ways to go. Sci Transl Med. 2021; 13(586). DOI: 10.1126/scitranslmed.abb1655. View

4.
Milne-Ives M, de Cock C, Lim E, Harper Shehadeh M, de Pennington N, Mole G . The Effectiveness of Artificial Intelligence Conversational Agents in Health Care: Systematic Review. J Med Internet Res. 2020; 22(10):e20346. PMC: 7644372. DOI: 10.2196/20346. View

5.
Lievin V, Hother C, Motzfeldt A, Winther O . Can large language models reason about medical questions?. Patterns (N Y). 2024; 5(3):100943. PMC: 10935498. DOI: 10.1016/j.patter.2024.100943. View