» Articles » PMID: 36119717

A Novel Silent Speech Recognition Approach Based on Parallel Inception Convolutional Neural Network and Mel Frequency Spectral Coefficient

Overview
Date 2022 Sep 19
PMID 36119717
Authors
Affiliations
Soon will be listed here.
Abstract

Silent speech recognition breaks the limitations of automatic speech recognition when acoustic signals cannot be produced or captured clearly, but still has a long way to go before being ready for any real-life applications. To address this issue, we propose a novel silent speech recognition framework based on surface electromyography (sEMG) signals. In our approach, a new deep learning architecture Parallel Inception Convolutional Neural Network (PICNN) is proposed and implemented in our silent speech recognition system, with six inception modules processing six channels of sEMG data, separately and simultaneously. Meanwhile, Mel Frequency Spectral Coefficients (MFSCs) are employed to extract speech-related sEMG features for the first time. We further design and generate a 100-class dataset containing daily life assistance demands for the elderly and disabled individuals. The experimental results obtained from 28 subjects confirm that our silent speech recognition method outperforms state-of-the-art machine learning algorithms and deep learning architectures, achieving the best recognition accuracy of 90.76%. With sEMG data collected from four new subjects, efficient steps of subject-based transfer learning are conducted to further improve the cross-subject recognition ability of the proposed model. Promising results prove that our sEMG-based silent speech recognition system could have high recognition accuracy and steady performance in practical applications.

Citing Articles

MemoCMT: multimodal emotion recognition using cross-modal transformer-based feature fusion.

Khan M, Tran P, Pham N, Saddik A, Othmani A Sci Rep. 2025; 15(1):5473.

PMID: 39953105 PMC: 11829003. DOI: 10.1038/s41598-025-89202-x.


Electrode Setup for Electromyography-Based Silent Speech Interfaces: A Pilot Study.

Salomons I, Del Blanco E, Navas E, Hernaez I Sensors (Basel). 2025; 25(3).

PMID: 39943420 PMC: 11821129. DOI: 10.3390/s25030781.


Human-machine interface for two-dimensional steering control with the auricular muscles.

Pinheiro D, Faber J, Micera S, Shokur S Front Neurorobot. 2023; 17:1154427.

PMID: 37342389 PMC: 10277645. DOI: 10.3389/fnbot.2023.1154427.


Multimodal transformer augmented fusion for speech emotion recognition.

Wang Y, Gu Y, Yin Y, Han Y, Zhang H, Wang S Front Neurorobot. 2023; 17:1181598.

PMID: 37283784 PMC: 10239840. DOI: 10.3389/fnbot.2023.1181598.

References
1.
Chowdhury S, Nimbarte A, Jaridi M, Creese R . Discrete wavelet transform analysis of surface electromyography for the fatigue assessment of neck and shoulder muscles. J Electromyogr Kinesiol. 2013; 23(5):995-1003. DOI: 10.1016/j.jelekin.2013.05.001. View

2.
Smith L, Hargrove L, Lock B, Kuiken T . Determining the optimal window length for pattern recognition-based myoelectric control: balancing the competing effects of classification error and controller delay. IEEE Trans Neural Syst Rehabil Eng. 2011; 19(2):186-92. PMC: 4241762. DOI: 10.1109/TNSRE.2010.2100828. View

3.
Chowdhury R, Reaz M, Ali M, Bakar A, Chellappan K, Chang T . Surface electromyography signal processing and classification techniques. Sensors (Basel). 2013; 13(9):12431-66. PMC: 3821366. DOI: 10.3390/s130912431. View

4.
Meltzner G, Heaton J, Deng Y, De Luca G, Roy S, Kline J . Silent Speech Recognition as an Alternative Communication Device for Persons with Laryngectomy. IEEE/ACM Trans Audio Speech Lang Process. 2018; 25(12):2386-2398. PMC: 5851476. DOI: 10.1109/TASLP.2017.2740000. View

5.
Kubo T, Yoshida M, Hattori T, Ikeda K . Shift invariant feature extraction for sEMG-based speech recognition with electrode grid. Annu Int Conf IEEE Eng Med Biol Soc. 2013; 2013:5797-800. DOI: 10.1109/EMBC.2013.6610869. View