Assessment of Artificial Intelligence Chatbot Responses to Top Searched Queries About Cancer
Overview
Affiliations
Importance: Consumers are increasingly using artificial intelligence (AI) chatbots as a source of information. However, the quality of the cancer information generated by these chatbots has not yet been evaluated using validated instruments.
Objective: To characterize the quality of information and presence of misinformation about skin, lung, breast, colorectal, and prostate cancers generated by 4 AI chatbots.
Design, Setting, And Participants: This cross-sectional study assessed AI chatbots' text responses to the 5 most commonly searched queries related to the 5 most common cancers using validated instruments. Search data were extracted from the publicly available Google Trends platform and identical prompts were used to generate responses from 4 AI chatbots: ChatGPT version 3.5 (OpenAI), Perplexity (Perplexity.AI), Chatsonic (Writesonic), and Bing AI (Microsoft).
Exposures: Google Trends' top 5 search queries related to skin, lung, breast, colorectal, and prostate cancer from January 1, 2021, to January 1, 2023, were input into 4 AI chatbots.
Main Outcomes And Measures: The primary outcomes were the quality of consumer health information based on the validated DISCERN instrument (scores from 1 [low] to 5 [high] for quality of information) and the understandability and actionability of this information based on the understandability and actionability domains of the Patient Education Materials Assessment Tool (PEMAT) (scores of 0%-100%, with higher scores indicating a higher level of understandability and actionability). Secondary outcomes included misinformation scored using a 5-item Likert scale (scores from 1 [no misinformation] to 5 [high misinformation]) and readability assessed using the Flesch-Kincaid Grade Level readability score.
Results: The analysis included 100 responses from 4 chatbots about the 5 most common search queries for skin, lung, breast, colorectal, and prostate cancer. The quality of text responses generated by the 4 AI chatbots was good (median [range] DISCERN score, 5 [2-5]) and no misinformation was identified. Understandability was moderate (median [range] PEMAT Understandability score, 66.7% [33.3%-90.1%]), and actionability was poor (median [range] PEMAT Actionability score, 20.0% [0%-40.0%]). The responses were written at the college level based on the Flesch-Kincaid Grade Level score.
Conclusions And Relevance: Findings of this cross-sectional study suggest that AI chatbots generally produce accurate information for the top cancer-related search queries, but the responses are not readily actionable and are written at a college reading level. These limitations suggest that AI chatbots should be used supplementarily and not as a primary source for medical information.
Kim S, Wihl J, Schramm S, Berberich C, Rosenkranz E, Schmitzer L Eur Radiol. 2025; .
PMID: 40055233 DOI: 10.1007/s00330-025-11484-6.
Chow J, Li K JMIR Cancer. 2025; 11:e66633.
PMID: 39965195 PMC: 11888077. DOI: 10.2196/66633.
Lawson McLean A, Hristidis V J Cancer Educ. 2025; .
PMID: 39964607 DOI: 10.1007/s13187-025-02592-4.
Lange-Drenth L, Willemer H, Banse M, Ernst A, Daubmann A, Holz A Front Digit Health. 2025; 7:1455143.
PMID: 39925640 PMC: 11802532. DOI: 10.3389/fdgth.2025.1455143.
Large Language Models for Chatbot Health Advice Studies: A Systematic Review.
Huo B, Boyle A, Marfo N, Tangamornsuksan W, Steen J, McKechnie T JAMA Netw Open. 2025; 8(2):e2457879.
PMID: 39903463 PMC: 11795331. DOI: 10.1001/jamanetworkopen.2024.57879.