Dataset Size Considerations for Robust Acoustic and Phonetic Speech Encoding Models in EEG

Overview

Journal Front Hum Neurosci

Specialty Neurology

Date 2023 Feb 6

PMID 36741776

Authors

Maansi Desai

Alyssa M Field

Liberty S Hamilton

Affiliations

Soon will be listed here.

Abstract

In many experiments that investigate auditory and speech processing in the brain using electroencephalography (EEG), the experimental paradigm is often lengthy and tedious. Typically, the experimenter errs on the side of including more data, more trials, and therefore conducting a longer task to ensure that the data are robust and effects are measurable. Recent studies used naturalistic stimuli to investigate the brain's response to individual or a combination of multiple speech features using system identification techniques, such as multivariate temporal receptive field (mTRF) analyses. The neural data collected from such experiments must be divided into a training set and a test set to fit and validate the mTRF weights. While a good strategy is clearly to collect as much data as is feasible, it is unclear how much data are needed to achieve stable results. Furthermore, it is unclear whether the specific stimulus used for mTRF fitting and the choice of feature representation affects how much data would be required for robust and generalizable results. Here, we used previously collected EEG data from our lab using sentence stimuli and movie stimuli as well as EEG data from an open-source dataset using audiobook stimuli to better understand how much data needs to be collected for naturalistic speech experiments measuring acoustic and phonetic tuning. We found that the EEG receptive field structure tested here stabilizes after collecting a training dataset of approximately 200 s of TIMIT sentences, around 600 s of movie trailers training set data, and approximately 460 s of audiobook training set data. Thus, we provide suggestions on the minimum amount of data that would be necessary for fitting mTRFs from naturalistic listening data. Our findings are motivated by highly practical concerns when working with children, patient populations, or others who may not tolerate long study sessions. These findings will aid future researchers who wish to study naturalistic speech processing in healthy and clinical populations while minimizing participant fatigue and retaining signal quality.

Citing Articles

Neural tracking of natural speech: an effective marker for post-stroke aphasia.

De Clercq P, Kries J, Mehraram R, Vanthornhout J, Francart T, Vandermosten M Brain Commun. 2025; 7(2):fcaf095.

PMID: 40066108 PMC: 11891514. DOI: 10.1093/braincomms/fcaf095.

A comparison of EEG encoding models using audiovisual stimuli and their unimodal counterparts.

Desai M, Field A, Hamilton L PLoS Comput Biol. 2024; 20(9):e1012433.

PMID: 39250485 PMC: 11412666. DOI: 10.1371/journal.pcbi.1012433.

References

Holdgraf C, Rieger J, Micheli C, Martin S, Knight R, Theunissen F . Encoding and Decoding Models in Cognitive Electrophysiology. Front Syst Neurosci. 2017; 11:61. PMC: 5623038. DOI: 10.3389/fnsys.2017.00061. View

Willmore B, Smyth D . Methods for first-order kernel estimation: simple-cell receptive fields from responses to natural scenes. Network. 2003; 14(3):553-77. View

Aertsen A, Johannesma P . The spectro-temporal receptive field. A functional characteristic of auditory neurons. Biol Cybern. 1981; 42(2):133-43. DOI: 10.1007/BF00336731. View

Brodbeck C, Hong L, Simon J . Rapid Transformation from Auditory to Linguistic Representations of Continuous Speech. Curr Biol. 2018; 28(24):3976-3983.e5. PMC: 6339854. DOI: 10.1016/j.cub.2018.10.042. View

Broderick M, Anderson A, Lalor E . Semantic Context Enhances the Early Auditory Encoding of Natural Speech. J Neurosci. 2019; 39(38):7564-7575. PMC: 6750931. DOI: 10.1523/JNEUROSCI.0584-19.2019. View

Desai M, Holder J, Villarreal C, Clark N, Hoang B, Hamilton L . Generalizable EEG Encoding Models with Naturalistic Audiovisual Stimuli. J Neurosci. 2021; 41(43):8946-8962. PMC: 8549533. DOI: 10.1523/JNEUROSCI.2891-20.2021. View

Hamilton L, Edwards E, Chang E . A Spatial Map of Onset and Sustained Responses to Speech in the Human Superior Temporal Gyrus. Curr Biol. 2018; 28(12):1860-1871.e4. DOI: 10.1016/j.cub.2018.04.033. View

Mesik J, Wojtczak M . The effects of data quantity on performance of temporal response function analyses of natural speech processing. Front Neurosci. 2023; 16:963629. PMC: 9878558. DOI: 10.3389/fnins.2022.963629. View

Di Liberto G, OSullivan J, Lalor E . Low-Frequency Cortical Entrainment to Speech Reflects Phoneme-Level Processing. Curr Biol. 2015; 25(19):2457-65. DOI: 10.1016/j.cub.2015.08.030. View

10.

Yu Z, Guindani M, Grieco S, Chen L, Holmes T, Xu X . Beyond t test and ANOVA: applications of mixed-effects models for more rigorous statistical analysis in neuroscience research. Neuron. 2021; 110(1):21-35. PMC: 8763600. DOI: 10.1016/j.neuron.2021.10.030. View

11.

Litwin-Kumar A, Harris K, Axel R, Sompolinsky H, Abbott L . Optimal Degrees of Synaptic Connectivity. Neuron. 2017; 93(5):1153-1164.e7. PMC: 5379477. DOI: 10.1016/j.neuron.2017.01.030. View

12.

Kegler M, Weissbart H, Reichenbach T . The neural response at the fundamental frequency of speech is modulated by word-level acoustic and linguistic information. Front Neurosci. 2022; 16:915744. PMC: 9355803. DOI: 10.3389/fnins.2022.915744. View

13.

Wu M, David S, Gallant J . Complete functional characterization of sensory neurons by system identification. Annu Rev Neurosci. 2006; 29:477-505. DOI: 10.1146/annurev.neuro.29.051605.113024. View

14.

Miller K, Muller K, Hermes D . Basis profile curve identification to understand electrical stimulation effects in human brain networks. PLoS Comput Biol. 2021; 17(9):e1008710. PMC: 8412306. DOI: 10.1371/journal.pcbi.1008710. View

15.

Crosse M, Di Liberto G, Bednar A, Lalor E . The Multivariate Temporal Response Function (mTRF) Toolbox: A MATLAB Toolbox for Relating Neural Signals to Continuous Stimuli. Front Hum Neurosci. 2016; 10:604. PMC: 5127806. DOI: 10.3389/fnhum.2016.00604. View

16.

Huth A, de Heer W, Griffiths T, Theunissen F, Gallant J . Natural speech reveals the semantic maps that tile human cerebral cortex. Nature. 2016; 532(7600):453-8. PMC: 4852309. DOI: 10.1038/nature17637. View

17.

Gibson E, Lobaugh N, Joordens S, McIntosh A . EEG variability: Task-driven or subject-driven signal of interest?. Neuroimage. 2022; 252:119034. DOI: 10.1016/j.neuroimage.2022.119034. View

18.

Mesgarani N, Cheung C, Johnson K, Chang E . Phonetic feature encoding in human superior temporal gyrus. Science. 2014; 343(6174):1006-10. PMC: 4350233. DOI: 10.1126/science.1245994. View

19.

Tang C, Hamilton L, Chang E . Intonational speech prosody encoding in the human auditory cortex. Science. 2017; 357(6353):797-801. PMC: 9584035. DOI: 10.1126/science.aam8577. View

20.

Theunissen F, David S, Singh N, Hsu A, Vinje W, Gallant J . Estimating spatio-temporal receptive fields of auditory and visual neurons from their responses to natural stimuli. Network. 2001; 12(3):289-316. View