Computational Models of Auditory Scene Analysis: A Review

Overview

Journal Front Neurosci

Date 2016 Nov 30

PMID 27895552

Citations 12

Authors

Beata T Szabo

Susan L Denham

Istvan Winkler

Affiliations

Soon will be listed here.

Abstract

Auditory scene analysis (ASA) refers to the process (es) of parsing the complex acoustic input into auditory perceptual objects representing either physical sources or temporal sound patterns, such as melodies, which contributed to the sound waves reaching the ears. A number of new computational models accounting for some of the perceptual phenomena of ASA have been published recently. Here we provide a theoretically motivated review of these computational models, aiming to relate their guiding principles to the central issues of the theoretical framework of ASA. Specifically, we ask how they achieve the grouping and separation of sound elements and whether they implement some form of competition between alternative interpretations of the sound input. We consider the extent to which they include predictive processes, as important current theories suggest that perception is inherently predictive, and also how they have been evaluated. We conclude that current computational models of ASA are fragmentary in the sense that rather than providing general competing interpretations of ASA, they focus on assessing the utility of specific processes (or algorithms) for finding the causes of the complex acoustic signal. This leaves open the possibility for integrating complementary aspects of the models into a more comprehensive theory of ASA.

Citing Articles

Design and evaluation of a global workspace agent embodied in a realistic multimodal environment.

Dossa R, Arulkumaran K, Juliani A, Sasai S, Kanai R Front Comput Neurosci. 2024; 18:1352685.

PMID: 38948336 PMC: 11211627. DOI: 10.3389/fncom.2024.1352685.

Simultaneous relative cue reliance in speech-on-speech masking.

Lutfi R, Zandona M, Lee J J Acoust Soc Am. 2023; 154(4):2530-2538.

PMID: 37870932 PMC: 10708949. DOI: 10.1121/10.0021874.

A biologically oriented algorithm for spatial sound segregation.

Chou K, Boyd A, Best V, Colburn H, Sen K Front Neurosci. 2022; 16:1004071.

PMID: 36312015 PMC: 9614053. DOI: 10.3389/fnins.2022.1004071.

Intention-based predictive information modulates auditory deviance processing.

Widmann A, Schroger E Front Neurosci. 2022; 16:995119.

PMID: 36248631 PMC: 9554204. DOI: 10.3389/fnins.2022.995119.

Making sense of periodicity glimpses in a prediction-update-loop-A computational model of attentive voice tracking.

Luberadzka J, Kayser H, Hohmann V J Acoust Soc Am. 2022; 151(2):712.

PMID: 35232067 PMC: 9088677. DOI: 10.1121/10.0009337.

References

Moore B, Gockel H . Properties of auditory stream formation. Philos Trans R Soc Lond B Biol Sci. 2012; 367(1591):919-31. PMC: 3282308. DOI: 10.1098/rstb.2011.0355. View

Andreou L, Kashino M, Chait M . The role of temporal regularity in auditory segregation. Hear Res. 2011; 280(1-2):228-35. DOI: 10.1016/j.heares.2011.06.001. View

Snyder J, Alain C, Picton T . Effects of attention on neuroelectric correlates of auditory stream segregation. J Cogn Neurosci. 2006; 18(1):1-13. DOI: 10.1162/089892906775250021. View

Goswami U, Wang H, Cruz A, Fosker T, Mead N, Huss M . Language-universal sensory deficits in developmental dyslexia: English, Spanish, and Chinese. J Cogn Neurosci. 2010; 23(2):325-37. DOI: 10.1162/jocn.2010.21453. View

Helfer K, Freyman R . The role of visual speech cues in reducing energetic and informational masking. J Acoust Soc Am. 2005; 117(2):842-9. DOI: 10.1121/1.1836832. View

Krumbholz K, Patterson R, Seither-Preisler A, Lammertmann C, Lutkenhoner B . Neuromagnetic evidence for a pitch processing center in Heschl's gyrus. Cereb Cortex. 2003; 13(7):765-72. DOI: 10.1093/cercor/13.7.765. View

Roberts B, Glasberg B, Moore B . Primitive stream segregation of tone sequences without differences in fundamental frequency or passband. J Acoust Soc Am. 2002; 112(5 Pt 1):2074-85. DOI: 10.1121/1.1508784. View

Alain C, Schuler B, McDonald K . Neural activity associated with distinguishing concurrent auditory objects. J Acoust Soc Am. 2002; 111(2):990-5. DOI: 10.1121/1.1434942. View

McDermott J, Wrobleski D, Oxenham A . Recovering sound sources from embedded repetition. Proc Natl Acad Sci U S A. 2011; 108(3):1188-93. PMC: 3024660. DOI: 10.1073/pnas.1004765108. View

10.

Teki S, Chait M, Kumar S, von Kriegstein K, Griffiths T . Brain bases for auditory stimulus-driven figure-ground segregation. J Neurosci. 2011; 31(1):164-71. PMC: 3059575. DOI: 10.1523/JNEUROSCI.3788-10.2011. View

11.

Kersten D, Mamassian P, Yuille A . Object perception as Bayesian inference. Annu Rev Psychol. 2004; 55:271-304. DOI: 10.1146/annurev.psych.55.090902.142005. View

12.

Alain C, Arnott S, Hevenor S, Graham S, Grady C . "What" and "where" in the human auditory system. Proc Natl Acad Sci U S A. 2001; 98(21):12301-6. PMC: 59809. DOI: 10.1073/pnas.211209098. View

13.

Leopold , Logothetis . Multistable phenomena: changing views in perception. Trends Cogn Sci. 1999; 3(7):254-264. DOI: 10.1016/s1364-6613(99)01332-7. View

14.

McDonald K, Alain C . Contribution of harmonicity and location to auditory object formation in free field: evidence from event-related brain potentials. J Acoust Soc Am. 2005; 118(3 Pt 1):1593-604. DOI: 10.1121/1.2000747. View

15.

Ding N, Simon J . Cortical entrainment to continuous speech: functional roles and interpretations. Front Hum Neurosci. 2014; 8:311. PMC: 4036061. DOI: 10.3389/fnhum.2014.00311. View

16.

Wang D, Chang P . An oscillatory correlation model of auditory streaming. Cogn Neurodyn. 2008; 2(1):7-19. PMC: 2289253. DOI: 10.1007/s11571-007-9035-8. View

17.

Szalardy O, Bohm T, Bendixen A, Winkler I . Event-related potential correlates of sound organization: early sensory and late cognitive effects. Biol Psychol. 2013; 93(1):97-104. DOI: 10.1016/j.biopsycho.2013.01.015. View

18.

Bee M, Klump G . Auditory stream segregation in the songbird forebrain: effects of time intervals on responses to interleaved tone sequences. Brain Behav Evol. 2005; 66(3):197-214. DOI: 10.1159/000087854. View

19.

Mathys C, Daunizeau J, Friston K, Stephan K . A bayesian foundation for individual learning under uncertainty. Front Hum Neurosci. 2011; 5:39. PMC: 3096853. DOI: 10.3389/fnhum.2011.00039. View

20.

Hupe J, Joffo L, Pressnitzer D . Bistability for audiovisual stimuli: Perceptual decision is modality specific. J Vis. 2009; 8(7):1.1-15. DOI: 10.1167/8.7.1. View