» Articles » PMID: 38880810

Multimodal Deep Learning for Dementia Classification Using Text and Audio

Overview
Journal Sci Rep
Specialty Science
Date 2024 Jun 16
PMID 38880810
Authors
Affiliations
Soon will be listed here.
Abstract

Dementia is a progressive neurological disorder that affects the daily lives of older adults, impacting their verbal communication and cognitive function. Early diagnosis is important to enhance the lifespan and quality of life for affected individuals. Despite its importance, diagnosing dementia is a complex process. Automated machine learning solutions involving multiple types of data have the potential to improve the process of automated dementia screening. In this study, we build deep learning models to classify dementia cases from controls using the Pitt Cookie Theft dataset from DementiaBank, a database of short participant responses to the structured task of describing a picture of a cookie theft. We fine-tune Wav2vec and Word2vec baseline models to make binary predictions of dementia from audio recordings and text transcripts, respectively. We conduct experiments with four versions of the dataset: (1) the original data, (2) the data with short sentences removed, (3) text-based augmentation of the original data, and (4) text-based augmentation of the data with short sentences removed. Our results indicate that synonym-based text data augmentation generally enhances the performance of models that incorporate the text modality. Without data augmentation, models using the text modality achieve around 60% accuracy and 70% AUROC scores, and with data augmentation, the models achieve around 80% accuracy and 90% AUROC scores. We do not observe significant improvements in performance with the addition of audio or timestamp information into the model. We include a qualitative error analysis of the sentences that are misclassified under each study condition. This study provides preliminary insights into the effects of both text-based data augmentation and multimodal deep learning for automated dementia classification.

References
1.
Kumar M, Vekkot S, Lalitha S, Gupta D, Govindraj V, Shaukat K . Dementia Detection from Speech Using Machine Learning and Deep Learning Architectures. Sensors (Basel). 2022; 22(23). PMC: 9740675. DOI: 10.3390/s22239311. View

2.
Kalantarian H, Jedoui K, Washington P, Tariq Q, Dunlap K, Schwartz J . Labeling images with facial emotion and the potential for pediatric healthcare. Artif Intell Med. 2019; 98:77-86. PMC: 6855300. DOI: 10.1016/j.artmed.2019.06.004. View

3.
Arvanitakis Z, Shah R, Bennett D . Diagnosis and Management of Dementia: Review. JAMA. 2019; 322(16):1589-1599. PMC: 7462122. DOI: 10.1001/jama.2019.4782. View

4.
Kalantarian H, Jedoui K, Washington P, Wall D . A Mobile Game for Automatic Emotion-Labeling of Images. IEEE Trans Games. 2020; 12(2):213-218. PMC: 7301713. DOI: 10.1109/tg.2018.2877325. View

5.
Chlasta K, Wolk K . Towards Computer-Based Automated Screening of Dementia Through Spontaneous Speech. Front Psychol. 2021; 11:623237. PMC: 7907518. DOI: 10.3389/fpsyg.2020.623237. View