A Novel Silent Speech Recognition Approach Based on Parallel Inception Convolutional Neural Network and Mel Frequency Spectral Coefficient

Overview

Journal Front Neurorobot

Date 2022 Sep 19

PMID 36119717

Authors

Jinghan Wu

Yakun Zhang

Liang Xie

Ye Yan

Xu Zhang

Shuang Liu

Xingwei An

Erwei Yin

Dong Ming

Affiliations

Soon will be listed here.

Abstract

Silent speech recognition breaks the limitations of automatic speech recognition when acoustic signals cannot be produced or captured clearly, but still has a long way to go before being ready for any real-life applications. To address this issue, we propose a novel silent speech recognition framework based on surface electromyography (sEMG) signals. In our approach, a new deep learning architecture Parallel Inception Convolutional Neural Network (PICNN) is proposed and implemented in our silent speech recognition system, with six inception modules processing six channels of sEMG data, separately and simultaneously. Meanwhile, Mel Frequency Spectral Coefficients (MFSCs) are employed to extract speech-related sEMG features for the first time. We further design and generate a 100-class dataset containing daily life assistance demands for the elderly and disabled individuals. The experimental results obtained from 28 subjects confirm that our silent speech recognition method outperforms state-of-the-art machine learning algorithms and deep learning architectures, achieving the best recognition accuracy of 90.76%. With sEMG data collected from four new subjects, efficient steps of subject-based transfer learning are conducted to further improve the cross-subject recognition ability of the proposed model. Promising results prove that our sEMG-based silent speech recognition system could have high recognition accuracy and steady performance in practical applications.

Citing Articles

MemoCMT: multimodal emotion recognition using cross-modal transformer-based feature fusion.

Khan M, Tran P, Pham N, Saddik A, Othmani A Sci Rep. 2025; 15(1):5473.

PMID: 39953105 PMC: 11829003. DOI: 10.1038/s41598-025-89202-x.

Electrode Setup for Electromyography-Based Silent Speech Interfaces: A Pilot Study.

Salomons I, Del Blanco E, Navas E, Hernaez I Sensors (Basel). 2025; 25(3).

PMID: 39943420 PMC: 11821129. DOI: 10.3390/s25030781.

Human-machine interface for two-dimensional steering control with the auricular muscles.

Pinheiro D, Faber J, Micera S, Shokur S Front Neurorobot. 2023; 17:1154427.

PMID: 37342389 PMC: 10277645. DOI: 10.3389/fnbot.2023.1154427.

Multimodal transformer augmented fusion for speech emotion recognition.

Wang Y, Gu Y, Yin Y, Han Y, Zhang H, Wang S Front Neurorobot. 2023; 17:1181598.

PMID: 37283784 PMC: 10239840. DOI: 10.3389/fnbot.2023.1181598.

References

Chowdhury S, Nimbarte A, Jaridi M, Creese R . Discrete wavelet transform analysis of surface electromyography for the fatigue assessment of neck and shoulder muscles. J Electromyogr Kinesiol. 2013; 23(5):995-1003. DOI: 10.1016/j.jelekin.2013.05.001. View

Smith L, Hargrove L, Lock B, Kuiken T . Determining the optimal window length for pattern recognition-based myoelectric control: balancing the competing effects of classification error and controller delay. IEEE Trans Neural Syst Rehabil Eng. 2011; 19(2):186-92. PMC: 4241762. DOI: 10.1109/TNSRE.2010.2100828. View

Chowdhury R, Reaz M, Ali M, Bakar A, Chellappan K, Chang T . Surface electromyography signal processing and classification techniques. Sensors (Basel). 2013; 13(9):12431-66. PMC: 3821366. DOI: 10.3390/s130912431. View

Meltzner G, Heaton J, Deng Y, De Luca G, Roy S, Kline J . Silent Speech Recognition as an Alternative Communication Device for Persons with Laryngectomy. IEEE/ACM Trans Audio Speech Lang Process. 2018; 25(12):2386-2398. PMC: 5851476. DOI: 10.1109/TASLP.2017.2740000. View

Kubo T, Yoshida M, Hattori T, Ikeda K . Shift invariant feature extraction for sEMG-based speech recognition with electrode grid. Annu Int Conf IEEE Eng Med Biol Soc. 2013; 2013:5797-800. DOI: 10.1109/EMBC.2013.6610869. View

Kim M, Cao B, Mau T, Wang J . Speaker-Independent Silent Speech Recognition from Flesh-Point Articulatory Movements Using an LSTM Neural Network. IEEE/ACM Trans Audio Speech Lang Process. 2018; 25(12):2323-2336. PMC: 6154510. DOI: 10.1109/TASLP.2017.2758999. View

Englehart K, Hudgins B, Parker P, Stevenson M . Classification of the myoelectric signal using time-frequency based representations. Med Eng Phys. 2000; 21(6-7):431-8. DOI: 10.1016/s1350-4533(99)00066-1. View

Bahl L, Jelinek F, Mercer R . A maximum likelihood approach to continuous speech recognition. IEEE Trans Pattern Anal Mach Intell. 2011; 5(2):179-90. DOI: 10.1109/tpami.1983.4767370. View

Mendes Junior J, Freitas M, Campos D, Farinelli F, Stevan Jr S, Pichorim S . Analysis of Influence of Segmentation, Features, and Classification in sEMG Processing: A Case Study of Recognition of Brazilian Sign Language Alphabet. Sensors (Basel). 2020; 20(16). PMC: 7471999. DOI: 10.3390/s20164359. View

10.

Chu K, Collins L, Mainsah B . USING AUTOMATIC SPEECH RECOGNITION AND SPEECH SYNTHESIS TO IMPROVE THE INTELLIGIBILITY OF COCHLEAR IMPLANT USERS IN REVERBERANT LISTENING ENVIRONMENTS. Proc IEEE Int Conf Acoust Speech Signal Process. 2020; 2020:6929-6933. PMC: 7568341. DOI: 10.1109/icassp40776.2020.9054450. View

11.

Liu H, Dong W, Li Y, Li F, Geng J, Zhu M . An epidermal sEMG tattoo-like patch as a new human-machine interface for patients with loss of voice. Microsyst Nanoeng. 2021; 6:16. PMC: 8433406. DOI: 10.1038/s41378-019-0127-5. View

12.

Atzori M, Cognolato M, Muller H . Deep Learning with Convolutional Neural Networks Applied to Electromyography Data: A Resource for the Classification of Movements for Prosthetic Hands. Front Neurorobot. 2016; 10:9. PMC: 5013051. DOI: 10.3389/fnbot.2016.00009. View

13.

Karlsson S, Yu J, Akay M . Enhancement of spectral analysis of myoelectric signals during static contractions using wavelet methods. IEEE Trans Biomed Eng. 1999; 46(6):670-84. DOI: 10.1109/10.764944. View

14.

Meltzner G, Heaton J, Deng Y, De Luca G, Roy S, Kline J . Development of sEMG sensors and algorithms for silent speech recognition. J Neural Eng. 2018; 15(4):046031. PMC: 6168082. DOI: 10.1088/1741-2552/aac965. View

15.

Tkach D, Huang H, Kuiken T . Study of stability of time-domain features for electromyographic pattern recognition. J Neuroeng Rehabil. 2010; 7:21. PMC: 2881049. DOI: 10.1186/1743-0003-7-21. View

16.

Di Nardo F, Mengarelli A, Strazza A, Agostini V, Knaflitz M, Burattini L . A new parameter for quantifying the variability of surface electromyographic signals during gait: The occurrence frequency. J Electromyogr Kinesiol. 2017; 36:25-33. DOI: 10.1016/j.jelekin.2017.06.006. View

17.

Srisuwan N, Phukpattaranont P, Limsakul C . Comparison of feature evaluation criteria for speech recognition based on electromyography. Med Biol Eng Comput. 2017; 56(6):1041-1051. DOI: 10.1007/s11517-017-1723-x. View

18.

Wu J, Zhao T, Zhang Y, Xie L, Yan Y, Yin E . Parallel-Inception CNN Approach for Facial sEMG based Silent Speech Recognition. Annu Int Conf IEEE Eng Med Biol Soc. 2021; 2021:554-557. DOI: 10.1109/EMBC46164.2021.9630373. View

19.

Xie H, Wang Z . Mean frequency derived via Hilbert-Huang transform with application to fatigue EMG signal analysis. Comput Methods Programs Biomed. 2006; 82(2):114-20. DOI: 10.1016/j.cmpb.2006.02.009. View

20.

Rameau A . Pilot study for a novel and personalized voice restoration device for patients with laryngectomy. Head Neck. 2019; 42(5):839-845. DOI: 10.1002/hed.26057. View