Multimodal and Spectral Degradation Effects on Speech and Emotion Recognition in Adult Listeners

Overview

Journal Trends Hear

Specialty Otorhinolaryngology

Date 2018 Nov 1

PMID 30378469

Citations 1

Authors

Chantel Ritter

Tara Vongpaisal

Affiliations

Soon will be listed here.

Abstract

For cochlear implant (CI) users, degraded spectral input hampers the understanding of prosodic vocal emotion, especially in difficult listening conditions. Using a vocoder simulation of CI hearing, we examined the extent to which informative multimodal cues in a talker's spoken expressions improve normal hearing (NH) adults' speech and emotion perception under different levels of spectral degradation (two, three, four, and eight spectral bands). Participants repeated the words verbatim and identified emotions (among four alternative options: happy, sad, angry, and neutral) in meaningful sentences that are semantically congruent with the expression of the intended emotion. Sentences were presented in their natural speech form and in speech sampled through a noise-band vocoder in sound (auditory-only) and video (auditory-visual) recordings of a female talker. Visual information had a more pronounced benefit in enhancing speech recognition in the lower spectral band conditions. Spectral degradation, however, did not interfere with emotion recognition performance when dynamic visual cues in a talker's expression are provided as participants scored at ceiling levels across all spectral band conditions. Our use of familiar sentences that contained congruent semantic and prosodic information have high ecological validity, which likely optimized listener performance under simulated CI hearing and may better predict CI users' outcomes in everyday listening contexts.

Citing Articles

Dor Y, Algom D, Shakuf V, Ben-David B Front Neurosci. 2022; 16:846117.

PMID: 35546888 PMC: 9082150. DOI: 10.3389/fnins.2022.846117.

References

Pals C, Sarampalis A, van Rijn H, Baskent D . Validation of a simple response-time measure of listening effort. J Acoust Soc Am. 2015; 138(3):EL187-92. DOI: 10.1121/1.4929614. View

Cleary M, Pisoni D, Kirk K . Influence of voice similarity on talker discrimination in children with normal hearing and children with cochlear implants. J Speech Lang Hear Res. 2005; 48(1):204-23. PMC: 3422886. DOI: 10.1044/1092-4388(2005/015). View

Green T, Faulkner A, Rosen S . Enhancing temporal cues to voice pitch in continuous interleaved sampling cochlear implants. J Acoust Soc Am. 2004; 116(4 Pt 1):2298-310. DOI: 10.1121/1.1785611. View

Tinnemore A, Zion D, Kulkarni A, Chatterjee M . Children's Recognition of Emotional Prosody in Spectrally Degraded Speech Is Predicted by Their Age and Cognitive Status. Ear Hear. 2018; 39(5):874-880. PMC: 6046271. DOI: 10.1097/AUD.0000000000000546. View

Luo X, Fu Q, Galvin 3rd J . Vocal emotion recognition by normal-hearing listeners and cochlear implant users. Trends Amplif. 2007; 11(4):301-15. PMC: 4111530. DOI: 10.1177/1084713807305301. View

Mildner V, Koska T . Recognition and production of emotions in children with cochlear implants. Clin Linguist Phon. 2014; 28(7-8):543-54. DOI: 10.3109/02699206.2014.927000. View

Geers A . Speech, language, and reading skills after early cochlear implantation. Arch Otolaryngol Head Neck Surg. 2004; 130(5):634-8. DOI: 10.1001/archotol.130.5.634. View

Wiefferink C, Rieffe C, Ketelaar L, De Raeve L, Frijns J . Emotion understanding in deaf children with a cochlear implant. J Deaf Stud Deaf Educ. 2012; 18(2):175-86. DOI: 10.1093/deafed/ens042. View

Hudgins J, CULLINAN W . Effects of sentence structure on sentence elicited imitation responses. J Speech Hear Res. 1978; 21(4):809-19. DOI: 10.1044/jshr.2104.809. View

10.

Shannon R, Zeng F, Kamath V, Wygonski J, Ekelid M . Speech recognition with primarily temporal cues. Science. 1995; 270(5234):303-4. DOI: 10.1126/science.270.5234.303. View

11.

Xu J, Kemeny S, Park G, Frattali C, Braun A . Language in context: emergent features of word, sentence, and narrative comprehension. Neuroimage. 2005; 25(3):1002-15. DOI: 10.1016/j.neuroimage.2004.12.013. View

12.

McGurk H, Macdonald J . Hearing lips and seeing voices. Nature. 1976; 264(5588):746-8. DOI: 10.1038/264746a0. View

13.

STUDEBAKER G . A "rationalized" arcsine transform. J Speech Hear Res. 1985; 28(3):455-62. DOI: 10.1044/jshr.2803.455. View

14.

van Heugten M, Volkova A, Trehub S, Schellenberg E . Children's recognition of spectrally degraded cartoon voices. Ear Hear. 2013; 35(1):118-25. DOI: 10.1097/AUD.0b013e3182a468d0. View

15.

Flom R, Bahrick L . The effects of intersensory redundancy on attention and memory: infants' long-term memory for orientation in audiovisual events. Dev Psychol. 2010; 46(2):428-36. PMC: 2897054. DOI: 10.1037/a0018410. View

16.

Nagels A, Kauschke C, Schrauf J, Whitney C, Straube B, Kircher T . Neural substrates of figurative language during natural speech perception: an fMRI study. Front Behav Neurosci. 2013; 7:121. PMC: 3776934. DOI: 10.3389/fnbeh.2013.00121. View

17.

Sommers M, Tye-Murray N, Spehar B . Auditory-visual speech perception and auditory-visual enhancement in normal-hearing younger and older adults. Ear Hear. 2005; 26(3):263-75. DOI: 10.1097/00003446-200506000-00003. View

18.

Bahrick L, Lickliter R . Intersensory redundancy guides attentional selectivity and perceptual learning in infancy. Dev Psychol. 2000; 36(2):190-201. PMC: 2704001. DOI: 10.1037//0012-1649.36.2.190. View

19.

Chatterjee M, Zion D, Deroche M, Burianek B, Limb C, Goren A . Voice emotion recognition by cochlear-implanted children and their normally-hearing peers. Hear Res. 2014; 322:151-62. PMC: 4615700. DOI: 10.1016/j.heares.2014.10.003. View

20.

Shannon R, Fu Q, Galvin 3rd J . The number of spectral channels required for speech recognition depends on the difficulty of the listening situation. Acta Otolaryngol Suppl. 2004; (552):50-4. DOI: 10.1080/03655230410017562. View