Generalizable EEG Encoding Models with Naturalistic Audiovisual Stimuli

Overview

Journal J Neurosci

Specialty Neurology

Date 2021 Sep 10

PMID 34503996

Citations 8

Authors

Maansi Desai

Jade Holder

Cassandra Villarreal

Nat Clark

Brittany Hoang

Liberty S Hamilton

Affiliations

Soon will be listed here.

Abstract

In natural conversations, listeners must attend to what others are saying while ignoring extraneous background sounds. Recent studies have used encoding models to predict electroencephalography (EEG) responses to speech in noise-free listening situations, sometimes referred to as "speech tracking." Researchers have analyzed how speech tracking changes with different types of background noise. It is unclear, however, whether neural responses from acoustically rich, naturalistic environments with and without background noise can be generalized to more controlled stimuli. If encoding models for acoustically rich, naturalistic stimuli are generalizable to other tasks, this could aid in data collection from populations of individuals who may not tolerate listening to more controlled and less engaging stimuli for long periods of time. We recorded noninvasive scalp EEG while 17 human participants (8 male/9 female) listened to speech without noise and audiovisual speech stimuli containing overlapping speakers and background sounds. We fit multivariate temporal receptive field encoding models to predict EEG responses to pitch, the acoustic envelope, phonological features, and visual cues in both stimulus conditions. Our results suggested that neural responses to naturalistic stimuli were generalizable to more controlled datasets. EEG responses to speech in isolation were predicted accurately using phonological features alone, while responses to speech in a rich acoustic background were more accurate when including both phonological and acoustic features. Our findings suggest that naturalistic audiovisual stimuli can be used to measure receptive fields that are comparable and generalizable to more controlled audio-only stimuli. Understanding spoken language in natural environments requires listeners to parse acoustic and linguistic information in the presence of other distracting stimuli. However, most studies of auditory processing rely on highly controlled stimuli with no background noise, or with background noise inserted at specific times. Here, we compare models where EEG data are predicted based on a combination of acoustic, phonetic, and visual features in highly disparate stimuli-sentences from a speech corpus and speech embedded within movie trailers. We show that modeling neural responses to highly noisy, audiovisual movies can uncover tuning for acoustic and phonetic information that generalizes to simpler stimuli typically used in sensory neuroscience experiments.

Citing Articles

Exploring Relevant Features for EEG-Based Investigation of Sound Perception in Naturalistic Soundscapes.

Haupt T, Rosenkranz M, Bleichner M, Bleichner M eNeuro. 2025; 12(1.

PMID: 39753371 PMC: 11747973. DOI: 10.1523/ENEURO.0287-24.2024.

A comparison of EEG encoding models using audiovisual stimuli and their unimodal counterparts.

Desai M, Field A, Hamilton L PLoS Comput Biol. 2024; 20(9):e1012433.

PMID: 39250485 PMC: 11412666. DOI: 10.1371/journal.pcbi.1012433.

Sensory and Perceptual Decisional Processes Underlying the Perception of Reverberant Auditory Environments.

Garcia-Lazaro H, Teng S eNeuro. 2024; 11(8).

PMID: 39122554 PMC: 11335967. DOI: 10.1523/ENEURO.0122-24.2024.

Speech-induced suppression during natural dialogues.

Gonzalez J, Nieto N, Brusco P, Gravano A, Kamienkowski J Commun Biol. 2024; 7(1):291.

PMID: 38459110 PMC: 10923813. DOI: 10.1038/s42003-024-05945-9.

Spatiotemporal consistency of neural responses to repeatedly presented video stimuli accounts for population preferences.

Hoshi A, Hirayama Y, Saito F, Ishiguro T, Suetani H, Kitajo K Sci Rep. 2023; 13(1):5532.

PMID: 37015982 PMC: 10073227. DOI: 10.1038/s41598-023-31751-0.

References

Vanthornhout J, Decruy L, Wouters J, Simon J, Francart T . Speech Intelligibility Predicted from Neural Entrainment of the Speech Envelope. J Assoc Res Otolaryngol. 2018; 19(2):181-191. PMC: 5878153. DOI: 10.1007/s10162-018-0654-z. View

Lerner Y, Honey C, Silbert L, Hasson U . Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. J Neurosci. 2011; 31(8):2906-15. PMC: 3089381. DOI: 10.1523/JNEUROSCI.3684-10.2011. View

Holcomb P, Anderson J, Grainger J . An electrophysiological study of cross-modal repetition priming. Psychophysiology. 2005; 42(5):493-507. PMC: 3582219. DOI: 10.1111/j.1469-8986.2005.00348.x. View

Chung W, Bidelman G . Cortical encoding and neurophysiological tracking of intensity and pitch cues signaling English stress patterns in native and nonnative speakers. Brain Lang. 2016; 155-156:49-57. DOI: 10.1016/j.bandl.2016.04.004. View

Brodbeck C, Hong L, Simon J . Rapid Transformation from Auditory to Linguistic Representations of Continuous Speech. Curr Biol. 2018; 28(24):3976-3983.e5. PMC: 6339854. DOI: 10.1016/j.cub.2018.10.042. View

Broderick M, Anderson A, Lalor E . Semantic Context Enhances the Early Auditory Encoding of Natural Speech. J Neurosci. 2019; 39(38):7564-7575. PMC: 6750931. DOI: 10.1523/JNEUROSCI.0584-19.2019. View

Grant K, SEITZ P . The use of visible speech cues for improving auditory detection of spoken sentences. J Acoust Soc Am. 2000; 108(3 Pt 1):1197-208. DOI: 10.1121/1.1288668. View

Teoh E, Cappelloni M, Lalor E . Prosodic pitch processing is represented in delta-band EEG and is dissociable from the cortical tracking of other acoustic and phonetic features. Eur J Neurosci. 2019; 50(11):3831-3842. DOI: 10.1111/ejn.14510. View

Kaiser J, Hertrich I, Ackermann H, Mathiak K, Lutzenberger W . Hearing lips: gamma-band activity during audiovisual speech perception. Cereb Cortex. 2004; 15(5):646-53. DOI: 10.1093/cercor/bhh166. View

10.

Khalighinejad B, Cruzatto da Silva G, Mesgarani N . Dynamic Encoding of Acoustic Features in Neural Responses to Continuous Speech. J Neurosci. 2017; 37(8):2176-2185. PMC: 5338759. DOI: 10.1523/JNEUROSCI.2383-16.2017. View

11.

OSullivan J, Power A, Mesgarani N, Rajaram S, Foxe J, Shinn-Cunningham B . Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG. Cereb Cortex. 2014; 25(7):1697-706. PMC: 4481604. DOI: 10.1093/cercor/bht355. View

12.

Wehbe L, Murphy B, Talukdar P, Fyshe A, Ramdas A, Mitchell T . Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses. PLoS One. 2014; 9(11):e112575. PMC: 4245107. DOI: 10.1371/journal.pone.0112575. View

13.

Di Liberto G, Peter V, Kalashnikova M, Goswami U, Burnham D, Lalor E . Atypical cortical entrainment to speech in the right hemisphere underpins phonemic deficits in dyslexia. Neuroimage. 2018; 175:70-79. DOI: 10.1016/j.neuroimage.2018.03.072. View

14.

Baskent D, Bazo D . Audiovisual asynchrony detection and speech intelligibility in noise with moderate to severe sensorineural hearing impairment. Ear Hear. 2011; 32(5):582-92. DOI: 10.1097/AUD.0b013e31820fca23. View

15.

Besle J, Bertrand O, Giard M . Electrophysiological (EEG, sEEG, MEG) evidence for multiple audiovisual interactions in the human auditory cortex. Hear Res. 2009; 258(1-2):143-51. DOI: 10.1016/j.heares.2009.06.016. View

16.

Holdgraf C, Rieger J, Micheli C, Martin S, Knight R, Theunissen F . Encoding and Decoding Models in Cognitive Electrophysiology. Front Syst Neurosci. 2017; 11:61. PMC: 5623038. DOI: 10.3389/fnsys.2017.00061. View

17.

Theunissen F, Sen K, Doupe A . Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. J Neurosci. 2000; 20(6):2315-31. PMC: 6772498. View

18.

Crosse M, Di Liberto G, Bednar A, Lalor E . The Multivariate Temporal Response Function (mTRF) Toolbox: A MATLAB Toolbox for Relating Neural Signals to Continuous Stimuli. Front Hum Neurosci. 2016; 10:604. PMC: 5127806. DOI: 10.3389/fnhum.2016.00604. View

19.

Hall D, Plack C . Pitch processing sites in the human auditory brain. Cereb Cortex. 2008; 19(3):576-85. PMC: 2638814. DOI: 10.1093/cercor/bhn108. View

20.

Nishimoto S, Vu A, Naselaris T, Benjamini Y, Yu B, Gallant J . Reconstructing visual experiences from brain activity evoked by natural movies. Curr Biol. 2011; 21(19):1641-6. PMC: 3326357. DOI: 10.1016/j.cub.2011.08.031. View