» Articles » PMID: 32001739

Automatic Vocal Tract Landmark Localization from Midsagittal MRI Data

Overview
Journal Sci Rep
Specialty Science
Date 2020 Feb 1
PMID 32001739
Citations 6
Authors
Affiliations
Soon will be listed here.
Abstract

The various speech sounds of a language are obtained by varying the shape and position of the articulators surrounding the vocal tract. Analyzing their variations is crucial for understanding speech production, diagnosing speech disorders and planning therapy. Identifying key anatomical landmarks of these structures on medical images is a pre-requisite for any quantitative analysis and the rising amount of data generated in the field calls for an automatic solution. The challenge lies in the high inter- and intra-speaker variability, the mutual interaction between the articulators and the moderate quality of the images. This study addresses this issue for the first time and tackles it by means of Deep Learning. It proposes a dedicated network architecture named Flat-net and its performance are evaluated and compared with eleven state-of-the-art methods from the literature. The dataset contains midsagittal anatomical Magnetic Resonance Images for 9 speakers sustaining 62 articulations with 21 annotated anatomical landmarks per image. Results show that the Flat-net approach outperforms the former methods, leading to an overall Root Mean Square Error of 3.6 pixels/0.36 cm obtained in a leave-one-out procedure over the speakers. The implementation codes are also shared publicly on GitHub.

Citing Articles

An automatic tracking method to measure the mandibula movement during real time MRI.

Mouchoux J, Sojka F, Kauffmann P, Dechent P, Meyer-Marcotty P, Quast A Sci Rep. 2024; 14(1):24125.

PMID: 39406788 PMC: 11480379. DOI: 10.1038/s41598-024-74285-9.


AI-assisted automatic MRI-based tongue volume evaluation in motor neuron disease (MND).

Vernikouskaya I, Muller H, Ludolph A, Kassubek J, Rasche V Int J Comput Assist Radiol Surg. 2024; 19(8):1579-1587.

PMID: 38536565 PMC: 11329588. DOI: 10.1007/s11548-024-03099-x.


Reinforcement learning in medical image analysis: Concepts, applications, challenges, and future directions.

Hu M, Zhang J, Matkovic L, Liu T, Yang X J Appl Clin Med Phys. 2023; 24(2):e13898.

PMID: 36626026 PMC: 9924115. DOI: 10.1002/acm2.13898.


3D Dynamic Spatiotemporal Atlas of the Vocal Tract during Consonant-Vowel Production from 2D Real Time MRI.

Douros I, Xie Y, Dourou C, Isaieva K, Vuissoz P, Felblinger J J Imaging. 2022; 8(9).

PMID: 36135393 PMC: 9504642. DOI: 10.3390/jimaging8090227.


Beyond the Edge: Markerless Pose Estimation of Speech Articulators from Ultrasound and Camera Images Using DeepLabCut.

Wrench A, Balch-Tomes J Sensors (Basel). 2022; 22(3).

PMID: 35161879 PMC: 8838804. DOI: 10.3390/s22031133.


References
1.
Serrurier A, Badin P, Lamalle L, Neuschaefer-Rube C . Characterization of inter-speaker articulatory variability: A two-level multi-speaker modelling approach based on MRI data. J Acoust Soc Am. 2019; 145(4):2149. DOI: 10.1121/1.5096631. View

2.
Guzman M, Miranda G, Olavarria C, Madrid S, Munoz D, Leiva M . Computerized Tomography Measures During and After Artificial Lengthening of the Vocal Tract in Subjects With Voice Disorders. J Voice. 2016; 31(1):124.e1-124.e10. DOI: 10.1016/j.jvoice.2016.01.003. View

3.
Narayanan S, Nayak K, Lee S, Sethy A, Byrd D . An approach to real-time magnetic resonance imaging for speech production. J Acoust Soc Am. 2004; 115(4):1771-6. DOI: 10.1121/1.1652588. View

4.
Story B . Synergistic modes of vocal tract articulation for American English vowels. J Acoust Soc Am. 2006; 118(6):3834-59. DOI: 10.1121/1.2118367. View