» Articles » PMID: 36341467

A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement

Overview
Journal Interspeech
Date 2022 Nov 7
PMID 36341467
Authors
Affiliations
Soon will be listed here.
Abstract

Preserving a patient's identity is a challenge for automatic, speech-based diagnosis of mental health disorders. In this paper, we address this issue by proposing adversarial disentanglement of depression characteristics and speaker identity. The model used for depression classification is trained in a speaker-identity-invariant manner by minimizing depression prediction loss and maximizing speaker prediction loss during training. The effectiveness of the proposed method is demonstrated on two datasets - DAIC-WOZ (English) and CONVERGE (Mandarin), with three feature sets (Mel-spectrograms, raw-audio signals, and the last-hidden-state of Wav2vec2.0), using a modified DepAudioNet model. With adversarial training, depression classification improves for every feature when compared to the baseline. Wav2vec2.0 features with adversarial learning resulted in the best performance (F1-score of 69.2% for DAIC-WOZ and 91.5% for CONVERGE). Analysis of the class-separability measure (J-ratio) of the hidden states of the DepAudioNet model shows that when adversarial learning is applied, the backend model loses some speaker-discriminability while it improves depression-discriminability. These results indicate that there are some components of speaker identity that may not be useful for depression detection and minimizing their effects provides a more accurate diagnosis of the underlying disorder and can safeguard a speaker's identity.

Citing Articles

Speechformer-CTC: Sequential Modeling of Depression Detection with Speech Temporal Classification.

Wang J, Ravi V, Flint J, Alwan A Speech Commun. 2024; 163.

PMID: 39364289 PMC: 11449263. DOI: 10.1016/j.specom.2024.103106.


Diagnostic accuracy of deep learning using speech samples in depression: a systematic review and meta-analysis.

Liu L, Liu L, Wafa H, Tydeman F, Xie W, Wang Y J Am Med Inform Assoc. 2024; 31(10):2394-2404.

PMID: 39013193 PMC: 11413444. DOI: 10.1093/jamia/ocae189.


Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments.

Zhang X, Zhang X, Chen W, Li C, Yu C Sci Rep. 2024; 14(1):9543.

PMID: 38664511 PMC: 11045867. DOI: 10.1038/s41598-024-60278-1.


A Privacy-Preserving Unsupervised Speaker Disentanglement Method for Depression Detection from Speech.

Ravi V, Wang J, Flint J, Alwan A CEUR Workshop Proc. 2024; 3649:57-63.

PMID: 38650610 PMC: 11034881.


Enhancing accuracy and privacy in speech-based depression detection through speaker disentanglement.

Ravi V, Wang J, Flint J, Alwan A Comput Speech Lang. 2024; 86.

PMID: 38313320 PMC: 10836190. DOI: 10.1016/j.csl.2023.101605.


References
1.
Andreasen N, Pfohl B . Linguistic analysis of speech in affective disorders. Arch Gen Psychiatry. 1976; 33(11):1361-7. DOI: 10.1001/archpsyc.1976.01770110089009. View

2.
Di Y, Wang J, Li W, Zhu T . Using i-vectors from voice features to identify major depressive disorder. J Affect Disord. 2021; 288:161-166. PMC: 11681263. DOI: 10.1016/j.jad.2021.04.004. View

3.
Ravi V, Wang J, Flint J, Alwan A . FRAUG: A FRAME RATE BASED DATA AUGMENTATION METHOD FOR DEPRESSION DETECTION FROM SPEECH SIGNALS. Proc IEEE Int Conf Acoust Speech Signal Process. 2022; 2022:6267-6271. PMC: 9070766. DOI: 10.1109/icassp43922.2022.9746307. View

4.
Lustgarten S, Garrison Y, Sinnard M, Flynn A . Digital privacy in mental healthcare: current issues and recommendations for technology use. Curr Opin Psychol. 2020; 36:25-31. PMC: 7195295. DOI: 10.1016/j.copsyc.2020.03.012. View

5.
McNEMAR Q . Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika. 2010; 12(2):153-7. DOI: 10.1007/BF02295996. View