» Articles » PMID: 37698703

Validity and Reliability of an Instrument Evaluating the Performance of Intelligent Chatbot: the Artificial Intelligence Performance Instrument (AIPI)

Overview
Date 2023 Sep 12
PMID 37698703
Authors
Affiliations
Soon will be listed here.
Abstract

Objectives: To evaluate the reliability and validity of the Artificial Intelligence Performance Instrument (AIPI).

Methods: Medical records of patients consulting in otolaryngology were evaluated by physicians and ChatGPT for differential diagnosis, management, and treatment. The ChatGPT performance was rated twice using AIPI within a 7-day period to assess test-retest reliability. Internal consistency was evaluated using Cronbach's α. Internal validity was evaluated by comparing the AIPI scores of the clinical cases rated by ChatGPT and 2 blinded practitioners. Convergent validity was measured by comparing the AIPI score with a modified version of the Ottawa Clinical Assessment Tool (OCAT). Interrater reliability was assessed using Kendall's tau.

Results: Forty-five patients completed the evaluations (28 females). The AIPI Cronbach's alpha analysis suggested an adequate internal consistency (α = 0.754). The test-retest reliability was moderate-to-strong for items and the total score of AIPI (r = 0.486, p = 0.001). The mean AIPI score of the senior otolaryngologist was significantly higher compared to the score of ChatGPT, supporting adequate internal validity (p = 0.001). Convergent validity reported a moderate and significant correlation between AIPI and modified OCAT (r = 0.319; p = 0.044). The interrater reliability reported significant positive concordance between both otolaryngologists for the patient feature, diagnostic, additional examination, and treatment subscores as well as for the AIPI total score.

Conclusions: AIPI is a valid and reliable instrument in assessing the performance of ChatGPT in ear, nose and throat conditions. Future studies are needed to investigate the usefulness of AIPI in medicine and surgery, and to evaluate the psychometric properties in these fields.

Citing Articles

Advancing Clinical Chatbot Validation Using AI-Powered Evaluation With a New 3-Bot Evaluation System: Instrument Validation Study.

Choo S, Yoo S, Endo K, Truong B, Son M JMIR Nurs. 2025; 8:e63058.

PMID: 40014000 PMC: 11884306. DOI: 10.2196/63058.


A radiopathomics model for predicting large-number cervical lymph node metastasis in clinical N0 papillary thyroid carcinoma.

Xiao W, Zhou W, Yuan H, Liu X, He F, Hu X Eur Radiol. 2025; .

PMID: 39881038 DOI: 10.1007/s00330-025-11377-8.


Artificial intelligence for image recognition in diagnosing oral and oropharyngeal cancer and leukoplakia.

Schmidl B, Hutten T, Pigorsch S, Stogbauer F, Hoch C, Hussain T Sci Rep. 2025; 15(1):3625.

PMID: 39880876 PMC: 11779835. DOI: 10.1038/s41598-025-85920-4.


Enhancing Multilingual Patient Education: ChatGPT's Accuracy and Readability for SSNHL Queries in English and Spanish.

Ajit-Roger E, Moise A, Peralta C, Orishchak O, Daniel S OTO Open. 2024; 8(4):e70048.

PMID: 39664064 PMC: 11633712. DOI: 10.1002/oto2.70048.


Harnessing the Power of ChatGPT in Cardiovascular Medicine: Innovations, Challenges, and Future Directions.

Leon M, Ruaengsri C, Pelletier G, Bethencourt D, Shibata M, Flores M J Clin Med. 2024; 13(21).

PMID: 39518681 PMC: 11546989. DOI: 10.3390/jcm13216543.


References
1.
Pernencar C, Saboia I, Dias J . How Far Can Conversational Agents Contribute to IBD Patient Health Care-A Review of the Literature. Front Public Health. 2022; 10:862432. PMC: 9282671. DOI: 10.3389/fpubh.2022.862432. View

2.
Wahlster W . Understanding computational dialogue understanding. Philos Trans A Math Phys Eng Sci. 2023; 381(2251):20220049. DOI: 10.1098/rsta.2022.0049. View

3.
Hill-Yardin E, Hutchinson M, Laycock R, Spencer S . A Chat(GPT) about the future of scientific publishing. Brain Behav Immun. 2023; 110:152-154. DOI: 10.1016/j.bbi.2023.02.022. View

4.
Mohammad B, Supti T, Alzubaidi M, Shah H, Alam T, Shah Z . The Pros and Cons of Using ChatGPT in Medical Education: A Scoping Review. Stud Health Technol Inform. 2023; 305:644-647. DOI: 10.3233/SHTI230580. View

5.
Rekman J, Hamstra S, Dudek N, Wood T, Seabrook C, Gofton W . A New Instrument for Assessing Resident Competence in Surgical Clinic: The Ottawa Clinic Assessment Tool. J Surg Educ. 2016; 73(4):575-82. DOI: 10.1016/j.jsurg.2016.02.003. View