» Articles » PMID: 38201398

Validation of a Deep Learning Chest X-ray Interpretation Model: Integrating Large-Scale AI and Large Language Models for Comparative Analysis with ChatGPT

Overview
Specialty Radiology
Date 2024 Jan 11
PMID 38201398
Authors
Affiliations
Soon will be listed here.
Abstract

This study evaluates the diagnostic accuracy and clinical utility of two artificial intelligence (AI) techniques: Kakao Brain Artificial Neural Network for Chest X-ray Reading (KARA-CXR), an assistive technology developed using large-scale AI and large language models (LLMs), and ChatGPT, a well-known LLM. The study was conducted to validate the performance of the two technologies in chest X-ray reading and explore their potential applications in the medical imaging diagnosis domain. The study methodology consisted of randomly selecting 2000 chest X-ray images from a single institution's patient database, and two radiologists evaluated the readings provided by KARA-CXR and ChatGPT. The study used five qualitative factors to evaluate the readings generated by each model: accuracy, false findings, location inaccuracies, count inaccuracies, and hallucinations. Statistical analysis showed that KARA-CXR achieved significantly higher diagnostic accuracy compared to ChatGPT. In the 'Acceptable' accuracy category, KARA-CXR was rated at 70.50% and 68.00% by two observers, while ChatGPT achieved 40.50% and 47.00%. Interobserver agreement was moderate for both systems, with KARA at 0.74 and GPT4 at 0.73. For 'False Findings', KARA-CXR scored 68.00% and 68.50%, while ChatGPT scored 37.00% for both observers, with high interobserver agreements of 0.96 for KARA and 0.97 for GPT4. In 'Location Inaccuracy' and 'Hallucinations', KARA-CXR outperformed ChatGPT with significant margins. KARA-CXR demonstrated a non-hallucination rate of 75%, which is significantly higher than ChatGPT's 38%. The interobserver agreement was high for KARA (0.91) and moderate to high for GPT4 (0.85) in the hallucination category. In conclusion, this study demonstrates the potential of AI and large-scale language models in medical imaging and diagnostics. It also shows that in the chest X-ray domain, KARA-CXR has relatively higher accuracy than ChatGPT.

Citing Articles

Integrating AI and Assistive Technologies in Healthcare: Insights from a Narrative Review of Reviews.

Giansanti D, Pirrera A Healthcare (Basel). 2025; 13(5).

PMID: 40077118 PMC: 11898476. DOI: 10.3390/healthcare13050556.


ChatGPT4's diagnostic accuracy in inpatient neurology: A retrospective cohort study.

Cano-Besquet S, Rice-Canetto T, Abou-El-Hassan H, Alarcon S, Zimmerman J, Issagholian L Heliyon. 2025; 10(24):e40964.

PMID: 39759322 PMC: 11699242. DOI: 10.1016/j.heliyon.2024.e40964.


Artificial intelligence in fracture detection on radiographs: a literature review.

Lo Mastro A, Grassi E, Berritto D, Russo A, Reginelli A, Guerra E Jpn J Radiol. 2024; .

PMID: 39538068 DOI: 10.1007/s11604-024-01702-4.


Revolution or risk?-Assessing the potential and challenges of GPT-4V in radiologic image interpretation.

Huppertz M, Siepmann R, Topp D, Nikoubashman O, Yuksel C, Kuhl C Eur Radiol. 2024; 35(3):1111-1121.

PMID: 39422726 PMC: 11836096. DOI: 10.1007/s00330-024-11115-6.


Advancements in Artificial Intelligence for Medical Computer-Aided Diagnosis.

Al-Antari M Diagnostics (Basel). 2024; 14(12).

PMID: 38928680 PMC: 11202700. DOI: 10.3390/diagnostics14121265.


References
1.
Polat Erdeniz S, Kramer D, Schrempf M, Rainer P, Felfernig A, Tran T . Machine Learning Based Risk Prediction for Major Adverse Cardiovascular Events for ELGA-Authorized Clinics1. Stud Health Technol Inform. 2023; 301:20-25. DOI: 10.3233/SHTI230006. View

2.
Zhu L, Mou W, Chen R . Can the ChatGPT and other large language models with internet-connected database solve the questions and concerns of patient with prostate cancer and help democratize medical knowledge?. J Transl Med. 2023; 21(1):269. PMC: 10115367. DOI: 10.1186/s12967-023-04123-5. View

3.
Mesko B, Topol E . The imperative for regulatory oversight of large language models (or generative AI) in healthcare. NPJ Digit Med. 2023; 6(1):120. PMC: 10326069. DOI: 10.1038/s41746-023-00873-0. View

4.
Govindarajan A, Govindarajan A, Tanamala S, Chattoraj S, Reddy B, Agrawal R . Role of an Automated Deep Learning Algorithm for Reliable Screening of Abnormality in Chest Radiographs: A Prospective Multicenter Quality Improvement Study. Diagnostics (Basel). 2022; 12(11). PMC: 9689183. DOI: 10.3390/diagnostics12112724. View

5.
Hewitt A . Dr AI will see you now. Clin Exp Ophthalmol. 2023; 51(5):409-410. DOI: 10.1111/ceo.14272. View