» Articles » PMID: 35868232

Maturation of Speech-in-Speech Recognition for Whispered and Voiced Speech

Overview
Date 2022 Jul 22
PMID 35868232
Authors
Affiliations
Soon will be listed here.
Abstract

Purpose: Some speech recognition data suggest that children rely less on voice pitch and harmonicity to support auditory scene analysis than adults. Two experiments evaluated development of speech-in-speech recognition using voiced speech and whispered speech, which lacks the harmonic structure of voiced speech.

Method: Listeners were 5- to 7-year-olds and adults with normal hearing. Targets were monosyllabic words organized into three-word sets that differ in vowel content. Maskers were two-talker or one-talker streams of speech. Targets and maskers were recorded by different female talkers in both voiced and whispered speaking styles. For each masker, speech reception thresholds (SRTs) were measured in all four combinations of target and masker speech, including matched and mismatched speaking styles for the target and masker.

Results: Children performed more poorly than adults overall. For the two-talker masker, this age effect was smaller for the whispered target and masker than for the other three conditions. Children's SRTs in this condition were predominantly positive, suggesting that they may have relied on a wholistic listening strategy rather than segregating the target from the masker. For the one-talker masker, age effects were consistent across the four conditions. Reduced informational masking for the one-talker masker could be responsible for differences in age effects for the two maskers. A benefit of mismatching the target and masker speaking style was observed for both target styles in the two-talker masker and for the voiced targets in the one-talker masker.

Conclusions: These results provide no compelling evidence that young school-age children and adults are differentially sensitive to the cues present in voiced and whispered speech. Both groups benefit from mismatches in speaking style under some conditions. These benefits could be due to a combination of reduced perceptual similarity, harmonic cancelation, and differences in energetic masking.

Citing Articles

Effects of linguistic context and noise type on speech comprehension.

Fitzgerald L, DeDe G, Shen J Front Psychol. 2024; 15:1345619.

PMID: 38375107 PMC: 10875108. DOI: 10.3389/fpsyg.2024.1345619.

References
1.
Swaminathan J, Mason C, Streeter T, Best V, Kidd Jr G, Patel A . Musical training, individual differences and the cocktail party problem. Sci Rep. 2015; 5:11628. PMC: 4481518. DOI: 10.1038/srep11628. View

2.
Freyman R, Balakrishnan U, Helfer K . Spatial release from masking with noise-vocoded speech. J Acoust Soc Am. 2008; 124(3):1627-37. PMC: 2736712. DOI: 10.1121/1.2951964. View

3.
LEVITT H . Transformed up-down methods in psychoacoustics. J Acoust Soc Am. 1971; 49(2):Suppl 2:467+. View

4.
Buss E, Bosen A . Band importance for speech-in-speech recognition. JASA Express Lett. 2021; 1(8):084402. PMC: 8499852. DOI: 10.1121/10.0005762. View

5.
Hendrickson K, Ernest D . The Recognition of Whispered Speech in Real-Time. Ear Hear. 2021; 43(2):554-562. DOI: 10.1097/AUD.0000000000001114. View