Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces

Overview

Journal PLoS Comput Biol

Specialty Biology

Date 2016 Nov 24

PMID 27880768

Citations 16

Authors

Florent Bocquelet

Thomas Hueber

Laurent Girin

Christophe Savariaux

Blaise Yvert

Affiliations

Soon will be listed here.

Abstract

Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer.

Citing Articles

The ethical significance of user-control in AI-driven speech-BCIs: a narrative review.

van Stuijvenberg O, Samlal D, Vansteensel M, Broekman M, Jongsma K Front Hum Neurosci. 2024; 18:1420334.

PMID: 39006157 PMC: 11240287. DOI: 10.3389/fnhum.2024.1420334.

Imagined speech event detection from electrocorticography and its transfer between speech modes and subjects.

de Borman A, Wittevrongel B, Dauwe I, Carrette E, Meurs A, Van Roost D Commun Biol. 2024; 7(1):818.

PMID: 38969758 PMC: 11226700. DOI: 10.1038/s42003-024-06518-6.

Representation of internal speech by single neurons in human supramarginal gyrus.

Wandelt S, Bjanes D, Pejsa K, Lee B, Liu C, Andersen R Nat Hum Behav. 2024; 8(6):1136-1149.

PMID: 38740984 PMC: 11199147. DOI: 10.1038/s41562-024-01867-y.

Direct speech reconstruction from sensorimotor brain activity with optimized deep learning models.

Berezutskaya J, Freudenburg Z, Vansteensel M, Aarnoutse E, Ramsey N, van Gerven M J Neural Eng. 2023; 20(5).

PMID: 37467739 PMC: 10510111. DOI: 10.1088/1741-2552/ace8be.

Overt speech decoding from cortical activity: a comparison of different linear methods.

Le Godais G, Roussel P, Bocquelet F, Aubert M, Kahane P, Chabardes S Front Hum Neurosci. 2023; 17:1124065.

PMID: 37425292 PMC: 10326283. DOI: 10.3389/fnhum.2023.1124065.

References

Pei X, Barbour D, Leuthardt E, Schalk G . Decoding vowels and consonants in spoken and imagined words using electrocorticographic signals in humans. J Neural Eng. 2011; 8(4):046028. PMC: 3772685. DOI: 10.1088/1741-2560/8/4/046028. View

Assaneo M, Trevisan M, Mindlin G . Discrete motor coordinates for vowel production. PLoS One. 2013; 8(11):e80373. PMC: 3828404. DOI: 10.1371/journal.pone.0080373. View

Story B . Phrase-level speech simulation with an airway modulation model of speech production. Comput Speech Lang. 2013; 27(4):989-1010. PMC: 3596841. DOI: 10.1016/j.csl.2012.10.005. View

Tate M, Herbet G, Moritz-Gasser S, Tate J, Duffau H . Probabilistic map of critical functional regions of the human cerebral cortex: Broca's area revisited. Brain. 2014; 137(Pt 10):2773-82. DOI: 10.1093/brain/awu168. View

Hickok G, Houde J, Rong F . Sensorimotor integration in speech processing: computational basis and neural organization. Neuron. 2011; 69(3):407-22. PMC: 3057382. DOI: 10.1016/j.neuron.2011.01.019. View

Donchin E, Spencer K, Wijesinghe R . The mental prosthesis: assessing the speed of a P300-based brain-computer interface. IEEE Trans Rehabil Eng. 2000; 8(2):174-9. DOI: 10.1109/86.847808. View

Beautemps D, Badin P, Bailly G . Linear degrees of freedom in speech production: analysis of cineradio- and labio-film data and articulatory-acoustic modeling. J Acoust Soc Am. 2001; 109(5 Pt 1):2165-80. DOI: 10.1121/1.1361090. View

Chapin J, Moxon K, Markowitz R, Nicolelis M . Real-time control of a robot arm using simultaneously recorded neurons in the motor cortex. Nat Neurosci. 1999; 2(7):664-70. DOI: 10.1038/10223. View

Velliste M, Perel S, Spalding M, Whitford A, Schwartz A . Cortical control of a prosthetic arm for self-feeding. Nature. 2008; 453(7198):1098-101. DOI: 10.1038/nature06996. View

10.

Hothorn T, Bretz F, Westfall P . Simultaneous inference in general parametric models. Biom J. 2008; 50(3):346-63. DOI: 10.1002/bimj.200810425. View

11.

Kello C, Plaut D . A neural network model of the articulatory-acoustic forward mapping trained on recordings of articulatory parameters. J Acoust Soc Am. 2004; 116(4 Pt 1):2354-64. DOI: 10.1121/1.1715112. View

12.

Grabski K, Lamalle L, Vilain C, Schwartz J, Vallee N, Tropres I . Functional MRI assessment of orofacial articulators: neural correlates of lip, jaw, larynx, and tongue movements. Hum Brain Mapp. 2011; 33(10):2306-21. PMC: 6870116. DOI: 10.1002/hbm.21363. View

13.

Wessberg J, Stambaugh C, Kralik J, Beck P, Laubach M, Chapin J . Real-time prediction of hand trajectory by ensembles of cortical neurons in primates. Nature. 2000; 408(6810):361-5. DOI: 10.1038/35042582. View

14.

Wodlinger B, Downey J, Tyler-Kabara E, Schwartz A, Boninger M, Collinger J . Ten-dimensional anthropomorphic arm control in a human brain-machine interface: difficulties, solutions, and limitations. J Neural Eng. 2014; 12(1):016011. DOI: 10.1088/1741-2560/12/1/016011. View

15.

Hickok G, Poeppel D . The cortical organization of speech processing. Nat Rev Neurosci. 2007; 8(5):393-402. DOI: 10.1038/nrn2113. View

16.

Toutios A, Ouni S, Laprie Y . Estimating the control parameters of an articulatory model from electromagnetic articulograph data. J Acoust Soc Am. 2011; 129(5):3245-57. DOI: 10.1121/1.3569714. View

17.

Cler M, Nieto-Castanon A, Guenther F, Stepp C . Surface electromyographic control of speech synthesis. Annu Int Conf IEEE Eng Med Biol Soc. 2015; 2014:5848-51. DOI: 10.1109/EMBC.2014.6944958. View

18.

Hochberg L, Serruya M, Friehs G, Mukand J, Saleh M, Caplan A . Neuronal ensemble control of prosthetic devices by a human with tetraplegia. Nature. 2006; 442(7099):164-71. DOI: 10.1038/nature04970. View

19.

Guenther F . Cortical interactions underlying the production of speech sounds. J Commun Disord. 2006; 39(5):350-65. DOI: 10.1016/j.jcomdis.2006.06.013. View

20.

Collinger J, Wodlinger B, Downey J, Wang W, Tyler-Kabara E, Weber D . High-performance neuroprosthetic control by an individual with tetraplegia. Lancet. 2012; 381(9866):557-64. PMC: 3641862. DOI: 10.1016/S0140-6736(12)61816-9. View