Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model

Overview

Journal Bioengineering (Basel)

Date 2023 May 27

PMID 37237693

Authors

Subin Erattakulangara

Karthika Kelat

David Meyer

Sarv Priya

Sajan Goud Lingala

Affiliations

Soon will be listed here.

Abstract

Dynamic magnetic resonance imaging has emerged as a powerful modality for investigating upper-airway function during speech production. Analyzing the changes in the vocal tract airspace, including the position of soft-tissue articulators (e.g., the tongue and velum), enhances our understanding of speech production. The advent of various fast speech MRI protocols based on sparse sampling and constrained reconstruction has led to the creation of dynamic speech MRI datasets on the order of 80-100 image frames/second. In this paper, we propose a stacked transfer learning U-NET model to segment the deforming vocal tract in 2D mid-sagittal slices of dynamic speech MRI. Our approach leverages (a) low- and mid-level features and (b) high-level features. The low- and mid-level features are derived from models pre-trained on labeled open-source brain tumor MR and lung CT datasets, and an in-house airway labeled dataset. The high-level features are derived from labeled protocol-specific MR images. The applicability of our approach to segmenting dynamic datasets is demonstrated in data acquired from three fast speech MRI protocols: Protocol 1: 3 T-based radial acquisition scheme coupled with a non-linear temporal regularizer, where speakers were producing French speech tokens; Protocol 2: 1.5 T-based uniform density spiral acquisition scheme coupled with a temporal finite difference (FD) sparsity regularization, where speakers were producing fluent speech tokens in English, and Protocol 3: 3 T-based variable density spiral acquisition scheme coupled with manifold regularization, where speakers were producing various speech tokens from the International Phonetic Alphabetic (IPA). Segments from our approach were compared to those from an expert human user (a vocologist), and the conventional U-NET model without transfer learning. Segmentations from a second expert human user (a radiologist) were used as ground truth. Evaluations were performed using the quantitative DICE similarity metric, the Hausdorff distance metric, and segmentation count metric. This approach was successfully adapted to different speech MRI protocols with only a handful of protocol-specific images (e.g., of the order of 20 images), and provided accurate segmentations similar to those of an expert human.

Citing Articles

Estimating Palatal and Pharyngeal Muscle Contraction in Hindi Syllable Pronunciation using Computational Modeling.

Vathulya M, Sarkar S, Singh I, Prajapati T, Sharma P Indian J Plast Surg. 2025; 57(Suppl 1):S24-S29.

PMID: 39741722 PMC: 11684914. DOI: 10.1055/s-0044-1788591.

Multi-label deep learning for comprehensive optic nerve head segmentation through data of fundus images.

Kako N, Abdulazeez A, Abdulqader D Heliyon. 2024; 10(18):e36996.

PMID: 39309959 PMC: 11416576. DOI: 10.1016/j.heliyon.2024.e36996.

A machine learning approach for vocal fold segmentation and disorder classification based on ensemble method.

Nobel S, Swapno S, Islam M, Safran M, Alfarhood S, Mridha M Sci Rep. 2024; 14(1):14435.

PMID: 38910146 PMC: 11758383. DOI: 10.1038/s41598-024-64987-5.

References

Bresch E, Narayanan S . Region segmentation in the frequency domain applied to upper airway real-time magnetic resonance images. IEEE Trans Med Imaging. 2009; 28(3):323-38. PMC: 2718576. DOI: 10.1109/TMI.2008.928920. View

Ruthven M, Miquel M, King A . Deep-learning-based segmentation of the vocal tract and articulators in real-time magnetic resonance images of speech. Comput Methods Programs Biomed. 2020; 198:105814. PMC: 7732702. DOI: 10.1016/j.cmpb.2020.105814. View

Burdumy M, Traser L, Richter B, Echternach M, Korvink J, Hennig J . Acceleration of MRI of the vocal tract provides additional insight into articulator modifications. J Magn Reson Imaging. 2015; 42(4):925-35. DOI: 10.1002/jmri.24857. View

Ha J, Sung I, Son J, Stone M, Ord R, Cho Y . Analysis of speech and tongue motion in normal and post-glossectomy speaker using cine MRI. J Appl Oral Sci. 2016; 24(5):472-480. PMC: 5083024. DOI: 10.1590/1678-775720150421. View

Echternach M, Sundberg J, Arndt S, Breyer T, Markl M, Schumacher M . Vocal tract and register changes analysed by real-time MRI in male professional singers-a pilot study. Logoped Phoniatr Vocol. 2008; 33(2):67-73. DOI: 10.1080/14015430701875653. View

Li L, Zimmer V, Schnabel J, Zhuang X . AtrialJSQnet: A New framework for joint segmentation and quantification of left atrium and scars incorporating spatial and shape information. Med Image Anal. 2021; 76:102303. DOI: 10.1016/j.media.2021.102303. View

Fu M, Barlaz M, Holtrop J, Perry J, Kuehn D, Shosted R . High-frame-rate full-vocal-tract 3D dynamic speech imaging. Magn Reson Med. 2016; 77(4):1619-1629. DOI: 10.1002/mrm.26248. View

Xie L, Udupa J, Tong Y, Torigian D, Huang Z, Kogan R . Automatic upper airway segmentation in static and dynamic MRI via anatomy-guided convolutional neural networks. Med Phys. 2021; 49(1):324-342. DOI: 10.1002/mp.15345. View

Li L, Zimmer V, Schnabel J, Zhuang X . Medical image analysis on left atrial LGE MRI for atrial fibrillation studies: A review. Med Image Anal. 2022; 77:102360. PMC: 7614005. DOI: 10.1016/j.media.2022.102360. View

10.

Bernard O, Lalande A, Zotti C, Cervenansky F, Yang X, Heng P . Deep Learning Techniques for Automatic MRI Cardiac Multi-Structures Segmentation and Diagnosis: Is the Problem Solved?. IEEE Trans Med Imaging. 2018; 37(11):2514-2525. DOI: 10.1109/TMI.2018.2837502. View

11.

Isaieva K, Laprie Y, Leclere J, Douros I, Felblinger J, Vuissoz P . Multimodal dataset of real-time 2D and static 3D MRI of healthy French speakers. Sci Data. 2021; 8(1):258. PMC: 8486854. DOI: 10.1038/s41597-021-01041-3. View

12.

Lim Y, Toutios A, Bliesener Y, Tian Y, Lingala S, Vaz C . A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images. Sci Data. 2021; 8(1):187. PMC: 8292336. DOI: 10.1038/s41597-021-00976-x. View

13.

Feng X, Blemker S, Inouye J, Pelland C, Zhao L, Meyer C . Assessment of velopharyngeal function with dual-planar high-resolution real-time spiral dynamic MRI. Magn Reson Med. 2018; 80(4):1467-1474. PMC: 6097923. DOI: 10.1002/mrm.27139. View

14.

Warfield S, Zou K, Wells W . Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Med Imaging. 2004; 23(7):903-21. PMC: 1283110. DOI: 10.1109/TMI.2004.828354. View

15.

Byrd D, Tobin S, Bresch E, Narayanan S . Timing effects of syllable structure and stress on nasals: a real-time MRI examination. J Phon. 2010; 37(1):97-110. PMC: 2768324. DOI: 10.1016/j.wocn.2008.10.002. View

16.

Yang J, Veeraraghavan H, van Elmpt W, Dekker A, Gooding M, Sharp G . CT images with expert manual contours of thoracic cancer for benchmarking auto-segmentation accuracy. Med Phys. 2020; 47(7):3250-3255. PMC: 8344378. DOI: 10.1002/mp.14107. View

17.

Javed A, Kim Y, Khoo M, Davidson Ward S, Nayak K . Dynamic 3-D MR Visualization and Detection of Upper Airway Obstruction During Sleep Using Region-Growing Segmentation. IEEE Trans Biomed Eng. 2015; 63(2):431-7. PMC: 4801340. DOI: 10.1109/TBME.2015.2462750. View

18.

Maturo S, Silver A, Nimkin K, Sagar P, Ashland J, van der Kouwe A . MRI with synchronized audio to evaluate velopharyngeal insufficiency. Cleft Palate Craniofac J. 2011; 49(6):761-3. DOI: 10.1597/10-255. View

19.

Miquel M, Freitas A, Wylezinska M . Evaluating velopharyngeal closure with real-time MRI. Pediatr Radiol. 2014; 45(6):941-2. DOI: 10.1007/s00247-014-3230-7. View

20.

Perry J, Kuehn D, Wachtel J, Bailey J, Luginbuhl L . Using magnetic resonance imaging for early assessment of submucous cleft palate: a case report. Cleft Palate Craniofac J. 2011; 49(4):e35-41. DOI: 10.1597/10-189. View