» Articles » PMID: 36712557

Consonant-Vowel Transition Models Based on Deep Learning for Objective Evaluation of Articulation

Overview
Authors
Affiliations
Soon will be listed here.
Abstract

Spectro-temporal dynamics of consonant-vowel (CV) transition regions are considered to provide robust cues related to articulation. In this work, we propose an objective measure of precise articulation, dubbed the objective articulation measure (OAM), by analyzing the CV transitions segmented around vowel onsets. The OAM is derived based on the posteriors of a convolutional neural network pre-trained to classify between different consonants using CV regions as input. We demonstrate that the OAM is correlated with perceptual measures in a variety of contexts including (a) adult dysarthric speech, (b) the speech of children with cleft lip/palate, and (c) a database of accented English speech from native Mandarin and Spanish speakers.

Citing Articles

Responsible development of clinical speech AI: Bridging the gap between clinical research and technology.

Berisha V, Liss J NPJ Digit Med. 2024; 7(1):208.

PMID: 39122889 PMC: 11316053. DOI: 10.1038/s41746-024-01199-1.


Dysarthria detection based on a deep learning model with a clinically-interpretable layer.

Xu L, Liss J, Berisha V JASA Express Lett. 2023; 3(1):015201.

PMID: 36725533 PMC: 9835557. DOI: 10.1121/10.0016833.

References
1.
Kalita S, Prasanna S, Dandapat S . Intelligibility assessment of cleft lip and palate speech using Gaussian posteriograms based on joint spectro-temporal features. J Acoust Soc Am. 2018; 144(4):2413. DOI: 10.1121/1.5064463. View

2.
Stevens K . Toward a model for lexical access based on acoustic landmarks and distinctive features. J Acoust Soc Am. 2002; 111(4):1872-91. DOI: 10.1121/1.1458026. View

3.
Saxon M, Liss J, Berisha V . OBJECTIVE MEASURES OF PLOSIVE NASALIZATION IN HYPERNASAL SPEECH. Proc IEEE Int Conf Acoust Speech Signal Process. 2020; 2019:6520-6524. PMC: 6954066. DOI: 10.1109/ICASSP.2019.8682339. View

4.
Henningsson G, Kuehn D, Sell D, Sweeney T, Trost-Cardamone J, Whitehill T . Universal parameters for reporting speech outcomes in individuals with cleft palate. Cleft Palate Craniofac J. 2008; 45(1):1-17. DOI: 10.1597/06-086.1. View

5.
Ohde R, Haley K, Barnes C . Perception of the [m]-[n] distinction in consonant-vowel (CV) and vowel-consonant (VC) syllables produced by child and adult talkers. J Acoust Soc Am. 2006; 119(3):1697-711. DOI: 10.1121/1.2140830. View