» Articles » PMID: 38045821

Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals

Overview
Journal Interspeech
Date 2023 Dec 4
PMID 38045821
Authors
Affiliations
Soon will be listed here.
Abstract

While speech-based depression detection methods that use speaker-identity features, such as speaker embeddings, are popular, they often compromise patient privacy. To address this issue, we propose a speaker disentanglement method that utilizes a non-uniform mechanism of adversarial SID loss maximization. This is achieved by varying the adversarial weight between different layers of a model during training. We find that a greater adversarial weight for the initial layers leads to performance improvement. Our approach using the ECAPA-TDNN model achieves an F1-score of 0.7349 (a 3.7% improvement over audio-only SOTA) on the DAIC-WoZ dataset, while simultaneously reducing the speaker-identification accuracy by 50%. Our findings suggest that identifying depression through speech signals can be accomplished without placing undue reliance on a speaker's identity, paving the way for privacy-preserving approaches of depression detection.

Citing Articles

Speechformer-CTC: Sequential Modeling of Depression Detection with Speech Temporal Classification.

Wang J, Ravi V, Flint J, Alwan A Speech Commun. 2024; 163.

PMID: 39364289 PMC: 11449263. DOI: 10.1016/j.specom.2024.103106.


A Privacy-Preserving Unsupervised Speaker Disentanglement Method for Depression Detection from Speech.

Ravi V, Wang J, Flint J, Alwan A CEUR Workshop Proc. 2024; 3649:57-63.

PMID: 38650610 PMC: 11034881.


Enhancing accuracy and privacy in speech-based depression detection through speaker disentanglement.

Ravi V, Wang J, Flint J, Alwan A Comput Speech Lang. 2024; 86.

PMID: 38313320 PMC: 10836190. DOI: 10.1016/j.csl.2023.101605.


Transformative Potential of AI in Healthcare: Definitions, Applications, and Navigating the Ethical Landscape and Public Perspectives.

Bekbolatova M, Mayer J, Ong C, Toma M Healthcare (Basel). 2024; 12(2).

PMID: 38255014 PMC: 10815906. DOI: 10.3390/healthcare12020125.

References
1.
Ravi V, Wang J, Flint J, Alwan A . A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement. Interspeech. 2022; 2022:3338-3342. PMC: 9635494. DOI: 10.21437/interspeech.2022-10798. View

2.
Andreasen N, Pfohl B . Linguistic analysis of speech in affective disorders. Arch Gen Psychiatry. 1976; 33(11):1361-7. DOI: 10.1001/archpsyc.1976.01770110089009. View

3.
Di Y, Wang J, Li W, Zhu T . Using i-vectors from voice features to identify major depressive disorder. J Affect Disord. 2021; 288:161-166. PMC: 11681263. DOI: 10.1016/j.jad.2021.04.004. View

4.
Yang Y, Fairbairn C, Cohn J . Detecting Depression Severity from Vocal Prosody. IEEE Trans Affect Comput. 2016; 4(2):142-150. PMC: 4791067. DOI: 10.1109/T-AFFC.2012.38. View

5.
Ravi V, Wang J, Flint J, Alwan A . FRAUG: A FRAME RATE BASED DATA AUGMENTATION METHOD FOR DEPRESSION DETECTION FROM SPEECH SIGNALS. Proc IEEE Int Conf Acoust Speech Signal Process. 2022; 2022:6267-6271. PMC: 9070766. DOI: 10.1109/icassp43922.2022.9746307. View