» Articles » PMID: 31130454

Simple Acoustic Features Can Explain Phoneme-Based Predictions of Cortical Responses to Speech

Overview
Journal Curr Biol
Publisher Cell Press
Specialty Biology
Date 2019 May 28
PMID 31130454
Citations 56
Authors
Affiliations
Soon will be listed here.
Abstract

When we listen to speech, we have to make sense of a waveform of sound pressure. Hierarchical models of speech perception assume that, to extract semantic meaning, the signal is transformed into unknown, intermediate neuronal representations. Traditionally, studies of such intermediate representations are guided by linguistically defined concepts, such as phonemes. Here, we argue that in order to arrive at an unbiased understanding of the neuronal responses to speech, we should focus instead on representations obtained directly from the stimulus. We illustrate our view with a data-driven, information theoretic analysis of a dataset of 24 young, healthy humans who listened to a 1 h narrative while their magnetoencephalogram (MEG) was recorded. We find that two recent results, the improved performance of an encoding model in which annotated linguistic and acoustic features were combined and the decoding of phoneme subgroups from phoneme-locked responses, can be explained by an encoding model that is based entirely on acoustic features. These acoustic features capitalize on acoustic edges and outperform Gabor-filtered spectrograms, which can explicitly describe the spectrotemporal characteristics of individual phonemes. By replicating our results in publicly available electroencephalography (EEG) data, we conclude that models of brain responses based on linguistic features can serve as excellent benchmarks. However, we believe that in order to further our understanding of human cortical responses to speech, we should also explore low-level and parsimonious explanations for apparent high-level phenomena.

Citing Articles

A listening advantage for native speech is reflected by attention-related activity in auditory cortex.

Liang M, Gerwien J, Gutschalk A Commun Biol. 2025; 8(1):180.

PMID: 39910341 PMC: 11799217. DOI: 10.1038/s42003-025-07601-2.


Measuring self-similarity in empirical signals to understand musical beat perception.

Lenc T, Lenoir C, Keller P, Polak R, Mulders D, Nozaradan S Eur J Neurosci. 2025; 61(2):e16637.

PMID: 39853878 PMC: 11760665. DOI: 10.1111/ejn.16637.


Language-specific neural dynamics extend syntax into the time domain.

Coopmans C, de Hoop H, Tezcan F, Hagoort P, Martin A PLoS Biol. 2025; 23(1):e3002968.

PMID: 39836653 PMC: 11750093. DOI: 10.1371/journal.pbio.3002968.


Quantifying the diverse contributions of hierarchical muscle interactions to motor function.

OReilly D, Shaw W, Hilt P, de Castro Aguiar R, Astill S, Delis I iScience. 2025; 28(1):111613.

PMID: 39834869 PMC: 11742840. DOI: 10.1016/j.isci.2024.111613.


Neural Speech Tracking Contribution of Lip Movements Predicts Behavioral Deterioration When the Speaker's Mouth Is Occluded.

Reisinger P, Gillis M, Suess N, Vanthornhout J, Haider C, Hartmann T eNeuro. 2025; 12(2).

PMID: 39819839 PMC: 11801124. DOI: 10.1523/ENEURO.0368-24.2024.


References
1.
Holdgraf C, Rieger J, Micheli C, Martin S, Knight R, Theunissen F . Encoding and Decoding Models in Cognitive Electrophysiology. Front Syst Neurosci. 2017; 11:61. PMC: 5623038. DOI: 10.3389/fnsys.2017.00061. View

2.
Cohen D, Cuffin B . Demonstration of useful differences between magnetoencephalogram and electroencephalogram. Electroencephalogr Clin Neurophysiol. 1983; 56(1):38-51. DOI: 10.1016/0013-4694(83)90005-6. View

3.
Destoky F, Philippe M, Bertels J, Verhasselt M, Coquelet N, Vander Ghinst M . Comparing the potential of MEG and EEG to uncover brain tracking of speech temporal envelope. Neuroimage. 2018; 184:201-213. DOI: 10.1016/j.neuroimage.2018.09.006. View

4.
Oostenveld R, Fries P, Maris E, Schoffelen J . FieldTrip: Open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data. Comput Intell Neurosci. 2011; 2011:156869. PMC: 3021840. DOI: 10.1155/2011/156869. View

5.
Doelling K, Arnal L, Ghitza O, Poeppel D . Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing. Neuroimage. 2013; 85 Pt 2:761-8. PMC: 3839250. DOI: 10.1016/j.neuroimage.2013.06.035. View