» Articles » PMID: 39423368

Large Language Models for Mental Health Applications: Systematic Review

Overview
Date 2024 Oct 18
PMID 39423368
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Large language models (LLMs) are advanced artificial neural networks trained on extensive datasets to accurately understand and generate natural language. While they have received much attention and demonstrated potential in digital health, their application in mental health, particularly in clinical settings, has generated considerable debate.

Objective: This systematic review aims to critically assess the use of LLMs in mental health, specifically focusing on their applicability and efficacy in early screening, digital interventions, and clinical settings. By systematically collating and assessing the evidence from current studies, our work analyzes models, methodologies, data sources, and outcomes, thereby highlighting the potential of LLMs in mental health, the challenges they present, and the prospects for their clinical use.

Methods: Adhering to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, this review searched 5 open-access databases: MEDLINE (accessed by PubMed), IEEE Xplore, Scopus, JMIR, and ACM Digital Library. Keywords used were (mental health OR mental illness OR mental disorder OR psychiatry) AND (large language models). This study included articles published between January 1, 2017, and April 30, 2024, and excluded articles published in languages other than English.

Results: In total, 40 articles were evaluated, including 15 (38%) articles on mental health conditions and suicidal ideation detection through text analysis, 7 (18%) on the use of LLMs as mental health conversational agents, and 18 (45%) on other applications and evaluations of LLMs in mental health. LLMs show good effectiveness in detecting mental health issues and providing accessible, destigmatized eHealth services. However, assessments also indicate that the current risks associated with clinical use might surpass their benefits. These risks include inconsistencies in generated text; the production of hallucinations; and the absence of a comprehensive, benchmarked ethical framework.

Conclusions: This systematic review examines the clinical applications of LLMs in mental health, highlighting their potential and inherent risks. The study identifies several issues: the lack of multilingual datasets annotated by experts, concerns regarding the accuracy and reliability of generated content, challenges in interpretability due to the "black box" nature of LLMs, and ongoing ethical dilemmas. These ethical concerns include the absence of a clear, benchmarked ethical framework; data privacy issues; and the potential for overreliance on LLMs by both physicians and patients, which could compromise traditional medical practices. As a result, LLMs should not be considered substitutes for professional mental health services. However, the rapid development of LLMs underscores their potential as valuable clinical aids, emphasizing the need for continued research and development in this area.

Trial Registration: PROSPERO CRD42024508617; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=508617.

Citing Articles

Empowering pediatric, adolescent, and young adult patients with cancer utilizing generative AI chatbots to reduce psychological burden and enhance treatment engagement: a pilot study.

Hasei J, Hanzawa M, Nagano A, Maeda N, Yoshida S, Endo M Front Digit Health. 2025; 7:1543543.

PMID: 40070545 PMC: 11893593. DOI: 10.3389/fdgth.2025.1543543.


Automated Digital Safety Planning Interventions for Young Adults: Qualitative Study Using Online Co-design Methods.

Meyerhoff J, Popowski S, Lakhtakia T, Tack E, Kornfield R, Kruzan K JMIR Form Res. 2025; 9:e69602.

PMID: 40009840 PMC: 11904377. DOI: 10.2196/69602.


Laypeople's Use of and Attitudes Toward Large Language Models and Search Engines for Health Queries: Survey Study.

Mendel T, Singh N, Mann D, Wiesenfeld B, Nov O J Med Internet Res. 2025; 27:e64290.

PMID: 39946180 PMC: 11888097. DOI: 10.2196/64290.


Evaluating Diagnostic Accuracy and Treatment Efficacy in Mental Health: A Comparative Analysis of Large Language Model Tools and Mental Health Professionals.

Levkovich I Eur J Investig Health Psychol Educ. 2025; 15(1).

PMID: 39852192 PMC: 11765082. DOI: 10.3390/ejihpe15010009.


Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language.

Koltcov S, Surkov A, Koltsova O, Ignatenko V PeerJ Comput Sci. 2024; 10:e2395.

PMID: 39650532 PMC: 11623104. DOI: 10.7717/peerj-cs.2395.


References
1.
Zhang Y, Lyu H, Liu Y, Zhang X, Wang Y, Luo J . Monitoring Depression Trends on Twitter During the COVID-19 Pandemic: Observational Study. JMIR Infodemiology. 2021; 1(1):e26769. PMC: 8330892. DOI: 10.2196/26769. View

2.
Metzler H, Baginski H, Niederkrotenthaler T, Garcia D . Detecting Potentially Harmful and Protective Suicide-Related Content on Twitter: Machine Learning Approach. J Med Internet Res. 2022; 24(8):e34705. PMC: 9434391. DOI: 10.2196/34705. View

3.
Sunkin S, Ng L, Lau C, Dolbeare T, Gilbert T, Thompson C . Allen Brain Atlas: an integrated spatio-temporal portal for exploring the central nervous system. Nucleic Acids Res. 2012; 41(Database issue):D996-D1008. PMC: 3531093. DOI: 10.1093/nar/gks1042. View

4.
Diniz E, Fontenele J, de Oliveira A, Bastos V, Teixeira S, Rabelo R . : A Natural Language Processing-Based Digital Phenotyping Tool for Smart Monitoring of Suicidal Ideation. Healthcare (Basel). 2022; 10(4). PMC: 9029735. DOI: 10.3390/healthcare10040698. View

5.
Elyoseph Z, Gur T, Haber Y, Simon T, Angert T, Navon Y . An Ethical Perspective on the Democratization of Mental Health With Generative AI. JMIR Ment Health. 2024; 11:e58011. PMC: 11500620. DOI: 10.2196/58011. View