» Articles » PMID: 40086291

Two-stage Data Augmentation for Improved ASR Performance for Dysarthric Speech

Overview
Journal Comput Biol Med
Publisher Elsevier
Date 2025 Mar 14
PMID 40086291
Authors
Affiliations
Soon will be listed here.
Abstract

Machine learning (ML) and Deep Neural Networks (DNN) have greatly aided the problem of Automatic Speech Recognition (ASR). However, accurate ASR for dysarthric speech remains a serious challenge. The dearth of usable data remains a problem in applying ML and DNN techniques for dysarthric speech recognition. In the current research, we address this challenge using a novel two-stage data augmentation scheme, a combination of static and dynamic data augmentation techniques, designed by leveraging an understanding of the characteristics of dysarthric speech. We explore speaker-independent ASR using modifications to healthy speech using various perturbations, devoicing of consonants, and voice conversion, comprising stage one or static augmentations. Subsequent to the first stage, a modified SpecAugment algorithm tailored for dysarthric speech is employed. This variant, termed Dysarthric SpecAugment, leverages the characteristics of dysarthric speech and forms the second stage of the two-stage augmentation approach. This acoustic model is used to pre-train a speaker-dependent ASR using dysarthric speech. The objective of this work is to improve the ASR performance for dysarthric speech using the two-stage data augmentation scheme. An end-to-end ASR using a Transformer acoustic model is used to evaluate the data augmentation scheme on speech from the UA dysarthric speech corpus. We achieve an absolute improvement of 10.7% and a relative improvement of 29.2% in word error rate (WER) over a baseline with no augmentation, with a final WER of 25.9% for the speaker-dependent system.