» Articles » PMID: 39254919

Assessing Knowledge About Medical Physics in Language-generative AI with Large Language Model: Using the Medical Physicist Exam

Overview
Date 2024 Sep 10
PMID 39254919
Authors
Affiliations
Soon will be listed here.
Abstract

This study aimed to evaluate the performance for answering the Japanese medical physicist examination and providing the benchmark of knowledge about medical physics in language-generative AI with large language model. We used questions from Japan's 2018, 2019, 2020, 2021 and 2022 medical physicist board examinations, which covered various question types, including multiple-choice questions, and mainly focused on general medicine and medical physics. ChatGPT-3.5 and ChatGPT-4.0 (OpenAI) were used. We compared the AI-based answers with the correct ones. The average accuracy rates were 42.2 ± 2.5% (ChatGPT-3.5) and 72.7 ± 2.6% (ChatGPT-4), showing that ChatGPT-4 was more accurate than ChatGPT-3.5 [all categories (except for radiation-related laws and recommendations/medical ethics): p value < 0.05]. Even with the ChatGPT model with higher accuracy, the accuracy rates were less than 60% in two categories; radiation metrology (55.6%), and radiation-related laws and recommendations/medical ethics (40.0%). These data provide the benchmark for knowledge about medical physics in ChatGPT and can be utilized as basic data for the development of various medical physics tools using ChatGPT (e.g., radiation therapy support tools with Japanese input).

Citing Articles

Application of NotebookLM, a large language model with retrieval-augmented generation, for lung cancer staging.

Tozuka R, Johno H, Amakawa A, Sato J, Muto M, Seki S Jpn J Radiol. 2024; .

PMID: 39585559 DOI: 10.1007/s11604-024-01705-1.

References
1.
Cha E, Elguindi S, Onochie I, Gorovets D, Deasy J, Zelefsky M . Clinical implementation of deep learning contour autosegmentation for prostate radiotherapy. Radiother Oncol. 2021; 159:1-7. PMC: 9444280. DOI: 10.1016/j.radonc.2021.02.040. View

2.
MacKay K, Bernstein D, Glocker B, Kamnitsas K, Taylor A . A Review of the Metrics Used to Assess Auto-Contouring Systems in Radiotherapy. Clin Oncol (R Coll Radiol). 2023; 35(6):354-369. DOI: 10.1016/j.clon.2023.01.016. View

3.
Heilemann G, Zimmermann L, Schotola R, Lechner W, Peer M, Widder J . Generating deliverable DICOM RT treatment plans for prostate VMAT by predicting MLC motion sequences with an encoder-decoder network. Med Phys. 2023; 50(8):5088-5094. DOI: 10.1002/mp.16545. View

4.
Tozuka R, Kadoya N, Tomori S, Kimura Y, Kajikawa T, Sugai Y . Improvement of deep learning prediction model in patient-specific QA for VMAT with MLC leaf position map and patient's dose distribution. J Appl Clin Med Phys. 2023; 24(10):e14055. PMC: 10562023. DOI: 10.1002/acm2.14055. View

5.
Gilson A, Safranek C, Huang T, Socrates V, Chi L, Taylor R . How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Med Educ. 2023; 9:e45312. PMC: 9947764. DOI: 10.2196/45312. View