» Articles » PMID: 30271809

Speaker-Independent Silent Speech Recognition from Flesh-Point Articulatory Movements Using an LSTM Neural Network

Overview
Date 2018 Oct 2
PMID 30271809
Citations 16
Authors
Affiliations
Soon will be listed here.
Abstract

Silent speech recognition (SSR) converts non-audio information such as articulatory movements into text. SSR has the potential to enable persons with laryngectomy to communicate through natural spoken expression. Current SSR systems have largely relied on speaker-dependent recognition models. The high degree of variability in articulatory patterns across different speakers has been a barrier for developing effective speaker-independent SSR approaches. Speaker-independent SSR approaches, however, are critical for reducing the amount of training data required from each speaker. In this paper, we investigate speaker-independent SSR from the movements of flesh points on tongue and lip with articulatory normalization methods that reduce the inter-speaker variation. To minimize the across-speaker physiological differences of the articulators, we propose Procrustes matching-based articulatory normalization by removing locational, rotational, and scaling differences. To further normalize the articulatory data, we apply feature-space maximum likelihood linear regression and i-vector. In this paper, we adopt a bidirectional long short term memory recurrent neural network (BLSTM) as an articulatory model to effectively model the articulatory movements with long-range articulatory history. A silent speech data set with flesh points was collected using an electromagnetic articulograph (EMA) from twelve healthy and two laryngectomized English speakers. Experimental results showed the effectiveness of our speaker-independent SSR approaches on healthy as well as laryngectomy speakers. In addition, BLSTM outperformed standard deep neural network. The best performance was obtained by BLSTM with all the three normalization approaches combined.

Citing Articles

Automated sentiment analysis of visually impaired students' audio feedback in virtual learning environments.

Elbourhamy D PeerJ Comput Sci. 2024; 10:e2143.

PMID: 38983237 PMC: 11232573. DOI: 10.7717/peerj-cs.2143.


Inter-patient ECG heartbeat classification for arrhythmia classification: a new approach of multi-layer perceptron with weight capsule and sequence-to-sequence combination.

Zhou C, Li X, Feng F, Zhang J, Lyu H, Wu W Front Physiol. 2023; 14:1247587.

PMID: 37841320 PMC: 10569428. DOI: 10.3389/fphys.2023.1247587.


Prediction of outpatients with conjunctivitis in Xinjiang based on LSTM and GRU models.

Wang Y, Yi X, Luo M, Wang Z, Qin L, Hu X PLoS One. 2023; 18(9):e0290541.

PMID: 37733673 PMC: 10513229. DOI: 10.1371/journal.pone.0290541.


MagTrack: A Wearable Tongue Motion Tracking System for Silent Speech Interfaces.

Cao B, Ravi S, Sebkhi N, Bhavsar A, Inan O, Xu W J Speech Lang Hear Res. 2023; 66(8S):3206-3221.

PMID: 37146629 PMC: 10555459. DOI: 10.1044/2023_JSLHR-22-00319.


Epidemiological characteristics, spatial clusters and monthly incidence prediction of hand, foot and mouth disease from 2017 to 2022 in Shanxi Province, China.

Ma Y, Xu S, Dong A, An J, Qin Y, Yang H Epidemiol Infect. 2023; 151:e54.

PMID: 37039461 PMC: 10126901. DOI: 10.1017/S0950268823000389.


References
1.
Hinton G, Osindero S, Teh Y . A fast learning algorithm for deep belief nets. Neural Comput. 2006; 18(7):1527-54. DOI: 10.1162/neco.2006.18.7.1527. View

2.
Mau T . Diagnostic evaluation and management of hoarseness. Med Clin North Am. 2010; 94(5):945-60. DOI: 10.1016/j.mcna.2010.05.010. View

3.
Fagan M, Ell S, Gilbert J, Sarrazin E, Chapman P . Development of a (silent) speech recognition system for patients following laryngectomy. Med Eng Phys. 2007; 30(4):419-25. DOI: 10.1016/j.medengphy.2007.05.003. View

4.
Liu H, Ng M . Electrolarynx in voice rehabilitation. Auris Nasus Larynx. 2007; 34(3):327-32. DOI: 10.1016/j.anl.2006.11.010. View

5.
Hochreiter S, Schmidhuber J . Long short-term memory. Neural Comput. 1997; 9(8):1735-80. DOI: 10.1162/neco.1997.9.8.1735. View