Article: Reliability and accuracy of artificial intelligence ChatGPT in providing information on ophthalmic diseases and management to patients

Purpose: To assess the accuracy of ophthalmic information provided by an artificial intelligence chatbot (ChatGPT).

Methods: Five diseases from 8 subspecialties of Ophthalmology were assessed by ChatGPT version 3.5. Three questions were asked to ChatGPT for each disease: what is x?; how is x diagnosed?; how is x treated? (x = name of the disease). Responses were graded by comparing them to the American Academy of Ophthalmology (AAO) guidelines for patients, with scores ranging from -3 (unvalidated and potentially harmful to a patient's health or well-being if they pursue such a suggestion) to 2 (correct and complete).

Main Outcomes: Accuracy of responses from ChatGPT in response to prompts related to ophthalmic health information in the form of scores on a scale from -3 to 2.

Results: Of the 120 questions, 93 (77.5%) scored ≥ 1. 27. (22.5%) scored ≤ -1; among these, 9 (7.5%) obtained a score of -3. The overall median score amongst all subspecialties was 2 for the question "What is x", 1.5 for "How is x diagnosed", and 1 for "How is x treated", though this did not achieve significance by Kruskal-Wallis testing.

Conclusions: Despite the positive scores, ChatGPT on its own still provides incomplete, incorrect, and potentially harmful information about common ophthalmic conditions, defined as the recommendation of invasive procedures or other interventions with potential for adverse sequelae which are not supported by the AAO for the disease in question. ChatGPT may be a valuable adjunct to patient education, but currently, it is not sufficient without concomitant human medical supervision.

Citing Articles

Accuracy of Artificial Intelligence Versus Clinicians in Real-Life Case Scenarios of Retinopathy of Prematurity.

Belenje A, Pandya D, Jalali S, Rani P Cureus. 2025; 17(2):e78597.

PMID: 40062070 PMC: 11889417. DOI: 10.7759/cureus.78597.

Evaluating the performance of ChatGPT in patient consultation and image-based preliminary diagnosis in thyroid eye disease.

Wang Y, Yang S, Zeng C, Xie Y, Shen Y, Li J Front Med (Lausanne). 2025; 12:1546706.

PMID: 40041459 PMC: 11876178. DOI: 10.3389/fmed.2025.1546706.

The use of artificial intelligence based chat bots in ophthalmology triage.

David D, Zloto O, Katz G, Huna-Baron R, Vishnevskia-Dai V, Armarnik S Eye (Lond). 2024; 39(4):785-789.

PMID: 39592814 PMC: 11885819. DOI: 10.1038/s41433-024-03488-1.

Comparing the Accuracy and Readability of Glaucoma-related Question Responses and Educational Materials by Google and ChatGPT.

Cohen S, Fisher A, Xu B, Song B J Curr Glaucoma Pract. 2024; 18(3):110-116.

PMID: 39575130 PMC: 11576343. DOI: 10.5005/jp-journals-10078-1448.

Large language models in patient education: a scoping review of applications in medicine.

Aydin S, Karabacak M, Vlachos V, Margetis K Front Med (Lausanne). 2024; 11:1477898.

PMID: 39534227 PMC: 11554522. DOI: 10.3389/fmed.2024.1477898.

References

1.

Else H . Abstracts written by ChatGPT fool scientists. Nature. 2023; 613(7944):423. DOI: 10.1038/d41586-023-00056-7. View

2.

Thorp H . ChatGPT is fun, but not an author. Science. 2023; 379(6630):313. DOI: 10.1126/science.adg7879. View

3.

Jia X, Pang Y, Liu L . Online Health Information Seeking Behavior: A Systematic Review. Healthcare (Basel). 2021; 9(12). PMC: 8701665. DOI: 10.3390/healthcare9121740. View

4.

Finney Rutten L, Blake K, Greenberg-Worisek A, Allen S, Moser R, Hesse B . Online Health Information Seeking Among US Adults: Measuring Progress Toward a Healthy People 2020 Objective. Public Health Rep. 2019; 134(6):617-625. PMC: 6832079. DOI: 10.1177/0033354919874074. View

5.

Singhal K, Azizi S, Tu T, Mahdavi S, Wei J, Chung H . Large language models encode clinical knowledge. Nature. 2023; 620(7972):172-180. PMC: 10396962. DOI: 10.1038/s41586-023-06291-2. View

Reliability and Accuracy of Artificial Intelligence ChatGPT in Providing Information on Ophthalmic Diseases and Management to Patients