Maturation of Speech-in-Speech Recognition for Whispered and Voiced Speech

Overview

Journal J Speech Lang Hear Res

Date 2022 Jul 22

PMID 35868232

Authors

Emily Buss

Margaret K Miller

Lori J Leibold

Affiliations

Soon will be listed here.

Abstract

Purpose: Some speech recognition data suggest that children rely less on voice pitch and harmonicity to support auditory scene analysis than adults. Two experiments evaluated development of speech-in-speech recognition using voiced speech and whispered speech, which lacks the harmonic structure of voiced speech.

Method: Listeners were 5- to 7-year-olds and adults with normal hearing. Targets were monosyllabic words organized into three-word sets that differ in vowel content. Maskers were two-talker or one-talker streams of speech. Targets and maskers were recorded by different female talkers in both voiced and whispered speaking styles. For each masker, speech reception thresholds (SRTs) were measured in all four combinations of target and masker speech, including matched and mismatched speaking styles for the target and masker.

Results: Children performed more poorly than adults overall. For the two-talker masker, this age effect was smaller for the whispered target and masker than for the other three conditions. Children's SRTs in this condition were predominantly positive, suggesting that they may have relied on a wholistic listening strategy rather than segregating the target from the masker. For the one-talker masker, age effects were consistent across the four conditions. Reduced informational masking for the one-talker masker could be responsible for differences in age effects for the two maskers. A benefit of mismatching the target and masker speaking style was observed for both target styles in the two-talker masker and for the voiced targets in the one-talker masker.

Conclusions: These results provide no compelling evidence that young school-age children and adults are differentially sensitive to the cues present in voiced and whispered speech. Both groups benefit from mismatches in speaking style under some conditions. These benefits could be due to a combination of reduced perceptual similarity, harmonic cancelation, and differences in energetic masking.

Citing Articles

Effects of linguistic context and noise type on speech comprehension.

Fitzgerald L, DeDe G, Shen J Front Psychol. 2024; 15:1345619.

PMID: 38375107 PMC: 10875108. DOI: 10.3389/fpsyg.2024.1345619.

References

Swaminathan J, Mason C, Streeter T, Best V, Kidd Jr G, Patel A . Musical training, individual differences and the cocktail party problem. Sci Rep. 2015; 5:11628. PMC: 4481518. DOI: 10.1038/srep11628. View

Freyman R, Balakrishnan U, Helfer K . Spatial release from masking with noise-vocoded speech. J Acoust Soc Am. 2008; 124(3):1627-37. PMC: 2736712. DOI: 10.1121/1.2951964. View

LEVITT H . Transformed up-down methods in psychoacoustics. J Acoust Soc Am. 1971; 49(2):Suppl 2:467+. View

Buss E, Bosen A . Band importance for speech-in-speech recognition. JASA Express Lett. 2021; 1(8):084402. PMC: 8499852. DOI: 10.1121/10.0005762. View

Hendrickson K, Ernest D . The Recognition of Whispered Speech in Real-Time. Ear Hear. 2021; 43(2):554-562. DOI: 10.1097/AUD.0000000000001114. View

Bee M, Micheyl C . The cocktail party problem: what is it? How can it be solved? And why should animal behaviorists study it?. J Comp Psychol. 2008; 122(3):235-51. PMC: 2692487. DOI: 10.1037/0735-7036.122.3.235. View

Cameron S, Glyde H, Dillon H . Listening in Spatialized Noise-Sentences Test (LiSN-S): normative and retest reliability data for adolescents and adults up to 60 years of age. J Am Acad Audiol. 2012; 22(10):697-709. DOI: 10.3766/jaaa.22.10.7. View

de Cheveigne A, McAdams S, Laroche J, Rosenberg M . Identification of concurrent harmonic and inharmonic vowels: a test of the theory of harmonic cancellation and enhancement. J Acoust Soc Am. 1995; 97(6):3736-48. DOI: 10.1121/1.412389. View

Corbin N, Bonino A, Buss E, Leibold L . Development of Open-Set Word Recognition in Children: Speech-Shaped Noise and Two-Talker Speech Maskers. Ear Hear. 2015; 37(1):55-63. PMC: 4684436. DOI: 10.1097/AUD.0000000000000201. View

10.

Popham S, Boebinger D, Ellis D, Kawahara H, McDermott J . Inharmonic speech reveals the role of harmonicity in the cocktail party problem. Nat Commun. 2018; 9(1):2122. PMC: 5974276. DOI: 10.1038/s41467-018-04551-8. View

11.

Apoux F, Healy E . Use of a compound approach to derive auditory-filter-wide frequency-importance functions for vowels and consonants. J Acoust Soc Am. 2012; 132(2):1078-87. PMC: 3427368. DOI: 10.1121/1.4730905. View

12.

Iyer N, Brungart D, Simpson B . Effects of target-masker contextual similarity on the multimasker penalty in a three-talker diotic listening task. J Acoust Soc Am. 2010; 128(5):2998-10. DOI: 10.1121/1.3479547. View

13.

Nittrouer S, Lowenstein J . Does harmonicity explain children's cue weighting of fricative-vowel syllables?. J Acoust Soc Am. 2009; 125(3):1679-92. PMC: 2677287. DOI: 10.1121/1.3056561. View

14.

Hukin R, Darwin C . Comparison of the effect of onset asynchrony on auditory grouping in pitch matching and vowel identification. Percept Psychophys. 1995; 57(2):191-6. DOI: 10.3758/bf03206505. View

15.

Freyman R, Balakrishnan U, Helfer K . Spatial release from informational masking in speech recognition. J Acoust Soc Am. 2001; 109(5 Pt 1):2112-22. DOI: 10.1121/1.1354984. View

16.

Freyman R, Griffin A, Oxenham A . Intelligibility of whispered speech in stationary and modulated noise maskers. J Acoust Soc Am. 2012; 132(4):2514-23. PMC: 3477190. DOI: 10.1121/1.4747614. View

17.

Nittrouer S, Tarr E . Coherence masking protection for speech in children and adults. Atten Percept Psychophys. 2011; 73(8):2606-23. PMC: 3222811. DOI: 10.3758/s13414-011-0210-y. View

18.

Vestergaard M, Patterson R . Effects of voicing in the recognition of concurrent syllables. J Acoust Soc Am. 2009; 126(6):2860-3. PMC: 2829090. DOI: 10.1121/1.3257582. View

19.

Storkel H, Hoover J . An online calculator to compute phonotactic probability and neighborhood density on the basis of child corpora of spoken American English. Behav Res Methods. 2010; 42(2):497-506. PMC: 2946641. DOI: 10.3758/BRM.42.2.497. View

20.

Bolia R, Nelson W, Ericson M, Simpson B . A speech corpus for multitalker communications research. J Acoust Soc Am. 2000; 107(2):1065-6. DOI: 10.1121/1.428288. View