Disagreements in Medical Ethics Question Answering Between Large Language Models and Physicians

Overview

Journal Res Sq

Date 2024 Nov 28

PMID 39606472

Authors

Shelly Soffer

Dafna Nesselroth

Keren Pragier

Roi Anteby

Donald Apakama

Emma Holmes

Ashwin Shreekant Sawant

Ethan Abbott

Lauren Alyse Lepow

Ishita Vasudev

Joshua Lampert

Moran Gendler

Nir Horesh

Orly Efros

Benjamin S Glicksberg

Robert Freeman

David L Reich

Alexander W Charney

Girish N Nadkarni

Eyal Klang

Affiliations

Soon will be listed here.

Abstract

Importance: Medical ethics is inherently complex, shaped by a broad spectrum of opinions, experiences, and cultural perspectives. The integration of large language models (LLMs) in healthcare is new and requires an understanding of their consistent adherence to ethical standards.

Objective: To compare the agreement rates in answering questions based on ethically ambiguous situations between three frontier LLMs (GPT-4, Gemini-pro-1.5, and Llama-3-70b) and a multi-disciplinary physician group.

Methods: In this cross-sectional study, three LLMs generated 1,248 medical ethics questions. These questions were derived based on the principles outlined in the American College of Physicians Ethics Manual. The topics spanned traditional, inclusive, interdisciplinary, and contemporary themes. Each model was then tasked in answering all generated questions. Twelve practicing physicians evaluated and responded to a randomly selected 10% subset of these questions. We compared agreement rates in question answering among the physicians, between the physicians and LLMs, and among LLMs.

Results: The models generated a total of 3,744 answers. Despite physicians perceiving the questions' complexity as moderate, with scores between 2 and 3 on a 5-point scale, their agreement rate was only 55.9%. The agreement between physicians and LLMs was also low at 57.9%. In contrast, the agreement rate among LLMs was notably higher at 76.8% (p < 0.001), emphasizing the consistency in LLM responses compared to both physician-physician and physician-LLM agreement.

Conclusions: LLMs demonstrate higher agreement rates in ethically complex scenarios compared to physicians, suggesting their potential utility as consultants in ambiguous ethical situations. Future research should explore how LLMs can enhance consistency while adapting to the complexities of real-world ethical dilemmas.

References

Parviainen J, Rantala J . Chatbot breakthrough in the 2020s? An ethical reflection on the trend of automated consultations in health care. Med Health Care Philos. 2021; 25(1):61-71. PMC: 8416570. DOI: 10.1007/s11019-021-10049-w. View

Glicksberg B, Timsina P, Patel D, Sawant A, Vaid A, Raut G . Evaluating the accuracy of a state-of-the-art large language model for prediction of admissions from the emergency room. J Am Med Inform Assoc. 2024; 31(9):1921-1928. PMC: 11339523. DOI: 10.1093/jamia/ocae103. View

Ayers J, Poliak A, Dredze M, Leas E, Zhu Z, Kelley J . Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Intern Med. 2023; 183(6):589-596. PMC: 10148230. DOI: 10.1001/jamainternmed.2023.1838. View

Mashayekhi J, Mafinejad M, Changiz T, Moosapour H, Salari P, Nedjat S . Exploring medical ethics' implementation challenges: A qualitative study. J Educ Health Promot. 2021; 10:66. PMC: 8057159. DOI: 10.4103/jehp.jehp_766_20. View

Hendelman W, Byszewski A . Formation of medical student professional identity: categorizing lapses of professionalism, and the learning environment. BMC Med Educ. 2014; 14:139. PMC: 4102062. DOI: 10.1186/1472-6920-14-139. View

Decety J, Cowell J . Friends or Foes: Is Empathy Necessary for Moral Behavior?. Perspect Psychol Sci. 2014; 9(5):525-37. PMC: 4241340. DOI: 10.1177/1745691614545130. View

Genuis S, Lipp C . Ethical diversity and the role of conscience in clinical medicine. Int J Family Med. 2014; 2013:587541. PMC: 3876678. DOI: 10.1155/2013/587541. View

Parsa-Parsi R . The International Code of Medical Ethics of the World Medical Association. JAMA. 2022; . DOI: 10.1001/jama.2022.19697. View

Varkey B . Principles of Clinical Ethics and Their Application to Practice. Med Princ Pract. 2020; 30(1):17-28. PMC: 7923912. DOI: 10.1159/000509119. View

10.

DuVal G, Clarridge B, Gensler G, Danis M . A national survey of U.S. internists' experiences with ethical dilemmas and ethics consultation. J Gen Intern Med. 2004; 19(3):251-8. PMC: 1492156. DOI: 10.1111/j.1525-1497.2004.21238.x. View

11.

Lahat A, Sharif K, Zoabi N, Patt Y, Sharif Y, Fisher L . Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4. J Med Internet Res. 2024; 26:e54571. PMC: 11240076. DOI: 10.2196/54571. View

12.

Gillon R . Medical ethics: four principles plus attention to scope. BMJ. 1994; 309(6948):184-8. PMC: 2540719. DOI: 10.1136/bmj.309.6948.184. View

13.

Kaldjian L, Weir R, Duffy T . A clinician's approach to clinical ethical reasoning. J Gen Intern Med. 2005; 20(3):306-11. PMC: 1490072. DOI: 10.1111/j.1525-1497.2005.40204.x. View

14.

Aharoni E, Fernandes S, Brady D, Alexander C, Criner M, Queen K . Attributions toward artificial agents in a modified Moral Turing Test. Sci Rep. 2024; 14(1):8458. PMC: 11061136. DOI: 10.1038/s41598-024-58087-7. View

15.

Lo B, Malina D, Pittman G, Morrissey S . Fundamentals of Medical Ethics - A New Perspective Series. N Engl J Med. 2023; 389(25):2392-2394. DOI: 10.1056/NEJMe2308472. View

16.

Snyder Sulmasy L, Bledsoe T . American College of Physicians Ethics Manual: Seventh Edition. Ann Intern Med. 2019; 170(2_Suppl):S1-S32. DOI: 10.7326/M18-2160. View