» Articles » PMID: 38405870

Interference of Mid-level Sound Statistics Underlie Human Speech Recognition Sensitivity in Natural Noise

Overview
Journal bioRxiv
Date 2024 Feb 26
PMID 38405870
Authors
Affiliations
Soon will be listed here.
Abstract

Recognizing speech in noise, such as in a busy restaurant, is an essential cognitive skill where the task difficulty varies across environments and noise levels. Although there is growing evidence that the auditory system relies on statistical representations for perceiving and coding natural sounds, it's less clear how statistical cues and neural representations contribute to segregating speech in natural auditory scenes. We demonstrate that human listeners rely on mid-level statistics to segregate and recognize speech in environmental noise. Using natural backgrounds and variants with perturbed spectro-temporal statistics, we show that speech recognition accuracy at a fixed noise level varies extensively across natural backgrounds (0% to 100%). Furthermore, for each background the unique interference created by summary statistics can mask or unmask speech, thus hindering or improving speech recognition. To identify the neural coding strategy and statistical cues that influence accuracy, we developed a framework that links summary statistics from a neural model to word recognition accuracy. Whereas a peripheral cochlear model accounts for only 60% of perceptual variance, summary statistics from a mid-level auditory midbrain model accurately predicts single trial sensory judgments, accounting for more than 90% of the perceptual variance. Furthermore, perceptual weights from the regression framework identify which statistics and tuned neural filters are influential and how they impact recognition. Thus, perception of speech in natural backgrounds relies on a mid-level auditory representation involving interference of multiple summary statistics that impact recognition beneficially or detrimentally across natural background sounds.

References
1.
Ewert S, Dau T . Characterizing frequency selectivity for envelope fluctuations. J Acoust Soc Am. 2000; 108(3 Pt 1):1181-96. DOI: 10.1121/1.1288665. View

2.
Bacon S, Grantham D . Modulation masking: effects of modulation frequency, depth, and phase. J Acoust Soc Am. 1989; 85(6):2575-80. DOI: 10.1121/1.397751. View

3.
Kumar S, Forster H, Bailey P, Griffiths T . Mapping unpleasantness of sounds to their auditory representation. J Acoust Soc Am. 2009; 124(6):3810-7. DOI: 10.1121/1.3006380. View

4.
Miller L, Escabi M, Read H, Schreiner C . Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex. J Neurophysiol. 2002; 87(1):516-27. DOI: 10.1152/jn.00395.2001. View

5.
Hall J, Haggard M, Fernandes M . Detection in noise by spectro-temporal pattern analysis. J Acoust Soc Am. 1984; 76(1):50-6. DOI: 10.1121/1.391005. View