Reasons Why Current Speech-enhancement Algorithms Do Not Improve Speech Intelligibility and Suggested Solutions

Overview

Journal IEEE Trans Audio Speech Lang Process

Publisher IEEE

Date 2011 Sep 13

PMID 21909285

Citations 22

Authors

Philipos C Loizou

Gibak Kim

Affiliations

Soon will be listed here.

Abstract

Existing speech enhancement algorithms can improve speech quality but not speech intelligibility, and the reasons for that are unclear. In the present paper, we present a theoretical framework that can be used to analyze potential factors that can influence the intelligibility of processed speech. More specifically, this framework focuses on the fine-grain analysis of the distortions introduced by speech enhancement algorithms. It is hypothesized that if these distortions are properly controlled, then large gains in intelligibility can be achieved. To test this hypothesis, intelligibility tests are conducted with human listeners in which we present processed speech with controlled speech distortions. The aim of these tests is to assess the perceptual effect of the various distortions that can be introduced by speech enhancement algorithms on speech intelligibility. Results with three different enhancement algorithms indicated that certain distortions are more detrimental to speech intelligibility degradation than others. When these distortions were properly controlled, however, large gains in intelligibility were obtained by human listeners, even by spectral-subtractive algorithms which are known to degrade speech quality and intelligibility.

Citing Articles

Influences of noise reduction on speech intelligibility, listening effort, and sound quality among adults with severe to profound hearing loss.

Dong R, Liu P, Tian X, Wang Y, Chen Y, Zhang J Front Neurosci. 2024; 18:1407775.

PMID: 39108313 PMC: 11301946. DOI: 10.3389/fnins.2024.1407775.

Sixty Years of Frequency-Domain Monaural Speech Enhancement: From Traditional to Deep Learning Methods.

Zheng C, Zhang H, Liu W, Luo X, Li A, Li X Trends Hear. 2023; 27:23312165231209913.

PMID: 37956661 PMC: 10658184. DOI: 10.1177/23312165231209913.

Individual Listener Preference for Strength of Single-Microphone Noise-Reduction; Trade-off Between Noise Tolerance and Signal Distortion Tolerance.

Reinten I, de Ronde-Brons I, Houben R, Dreschler W Trends Hear. 2023; 27:23312165231192304.

PMID: 37525630 PMC: 10395179. DOI: 10.1177/23312165231192304.

Enhancement of speech-in-noise comprehension through vibrotactile stimulation at the syllabic rate.

Guilleminot P, Reichenbach T Proc Natl Acad Sci U S A. 2022; 119(13):e2117000119.

PMID: 35312362 PMC: 9060510. DOI: 10.1073/pnas.2117000119.

Deep Learning-Based Speech Enhancement With a Loss Trading Off the Speech Distortion and the Noise Residue for Cochlear Implants.

Kang Y, Zheng N, Meng Q Front Med (Lausanne). 2021; 8:740123.

PMID: 34820392 PMC: 8606413. DOI: 10.3389/fmed.2021.740123.

References

Rhebergen K, Versfeld N, Dreschler W . Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. J Acoust Soc Am. 2007; 120(6):3988-97. DOI: 10.1121/1.2358008. View

Hu Y, Loizou P . A comparative intelligibility study of single-microphone noise reduction algorithms. J Acoust Soc Am. 2007; 122(3):1777. DOI: 10.1121/1.2766778. View

Ma J, Hu Y, Loizou P . Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. J Acoust Soc Am. 2009; 125(5):3387-405. PMC: 2806444. DOI: 10.1121/1.3097493. View

Brungart D, Chang P, Simpson B, Wang D . Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. J Acoust Soc Am. 2007; 120(6):4007-18. DOI: 10.1121/1.2363929. View

Hu Y, Loizou P . A new sound coding strategy for suppressing noise in cochlear implants. J Acoust Soc Am. 2008; 124(1):498-509. PMC: 2564827. DOI: 10.1121/1.2924131. View

Hu Y, Loizou P . Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun. 2007; 49(7):588-601. PMC: 2098693. DOI: 10.1016/j.specom.2006.12.006. View

Kim G, Lu Y, Hu Y, Loizou P . An algorithm that improves speech intelligibility in noise for normal-hearing listeners. J Acoust Soc Am. 2009; 126(3):1486-94. PMC: 2757424. DOI: 10.1121/1.3184603. View

Pavlovic C, STUDEBAKER G, Sherbecoe R . An articulation index based procedure for predicting the speech recognition performance of hearing-impaired individuals. J Acoust Soc Am. 1986; 80(1):50-7. DOI: 10.1121/1.394082. View

Bentler R, Wu Y, Kettel J, Hurtig R . Digital noise reduction: outcomes from laboratory and field studies. Int J Audiol. 2008; 47(8):447-60. DOI: 10.1080/14992020802033091. View

10.

Pavlovic C . Derivation of primary parameters and procedures for use in speech intelligibility predictions. J Acoust Soc Am. 1987; 82(2):413-22. DOI: 10.1121/1.395442. View

11.

Li N, Loizou P . Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. J Acoust Soc Am. 2008; 123(3):1673-82. PMC: 2696360. DOI: 10.1121/1.2832617. View

12.

Wang D, Kjems U, Pedersen M, Boldt J, Lunner T . Speech intelligibility in background noise with ideal binary time-frequency masking. J Acoust Soc Am. 2009; 125(4):2336-47. DOI: 10.1121/1.3083233. View