» Articles » PMID: 39965198

Predicting Satisfaction With Chat-Counseling at a 24/7 Chat Hotline for the Youth: Natural Language Processing Study

Overview
Journal JMIR AI
Publisher JMIR Publications
Date 2025 Feb 18
PMID 39965198
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Chat-based counseling services are popular for the low-threshold provision of mental health support to youth. In addition, they are particularly suitable for the utilization of natural language processing (NLP) for improved provision of care.

Objective: Consequently, this paper evaluates the feasibility of such a use case, namely, the NLP-based automated evaluation of satisfaction with the chat interaction. This preregistered approach could be used for evaluation and quality control procedures, as it is particularly relevant for those services.

Methods: The consultations of 2609 young chatters (around 140,000 messages) and corresponding feedback were used to train and evaluate classifiers to predict whether a chat was perceived as helpful or not. On the one hand, we trained a word vectorizer in combination with an extreme gradient boosting (XGBoost) classifier, applying cross-validation and extensive hyperparameter tuning. On the other hand, we trained several transformer-based models, comparing model types, preprocessing, and over- and undersampling techniques. For both model types, we selected the best-performing approach on the training set for a final performance evaluation on the 522 users in the final test set.

Results: The fine-tuned XGBoost classifier achieved an area under the receiver operating characteristic score of 0.69 (P<.001), as well as a Matthews correlation coefficient of 0.25 on the previously unseen test set. The selected Longformer-based model did not outperform this baseline, scoring 0.68 (P=.69). A Shapley additive explanations explainability approach suggested that help seekers rating a consultation as helpful commonly expressed their satisfaction already within the conversation. In contrast, the rejection of offered exercises predicted perceived unhelpfulness.

Conclusions: Chat conversations include relevant information regarding the perceived quality of an interaction that can be used by NLP-based prediction approaches. However, to determine if the moderate predictive performance translates into meaningful service improvements requires randomized trials. Further, our results highlight the relevance of contrasting pretrained models with simpler baselines to avoid the implementation of unnecessarily complex models.

Trial Registration: Open Science Framework SR4Q9; https://osf.io/sr4q9.

References
1.
Chicco D, Totsch N, Jurman G . The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Min. 2021; 14(1):13. PMC: 7863449. DOI: 10.1186/s13040-021-00244-z. View

2.
Xu Z, Xu Y, Cheung F, Cheng M, Lung D, Law Y . Detecting suicide risk using knowledge-aware natural language processing and counseling service data. Soc Sci Med. 2021; 283:114176. DOI: 10.1016/j.socscimed.2021.114176. View

3.
Eckert M, Efe Z, Guenthner L, Baldofski S, Kuehne K, Wundrack R . Acceptability and feasibility of a messenger-based psychological chat counselling service for children and young adults ("krisenchat"): A cross-sectional study. Internet Interv. 2022; 27:100508. PMC: 8857586. DOI: 10.1016/j.invent.2022.100508. View

4.
Dwyer D, Falkai P, Koutsouleris N . Machine Learning Approaches for Clinical Psychology and Psychiatry. Annu Rev Clin Psychol. 2018; 14:91-118. DOI: 10.1146/annurev-clinpsy-032816-045037. View

5.
DeLong E, Delong D, Clarke-Pearson D . Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988; 44(3):837-45. View