Anchor Objects Drive Realism While Diagnostic Objects Drive Categorization in GAN Generated Scenes

Overview

Journal Commun Psychol

Publisher Nature Portfolio

Date 2024 Sep 6

PMID 39242968

Authors

Aylin Kallmayer

Melissa L-H Vo

Affiliations

Soon will be listed here.

Abstract

Our visual surroundings are highly complex. Despite this, we understand and navigate them effortlessly. This requires transforming incoming sensory information into representations that not only span low- to high-level visual features (e.g., edges, object parts, objects), but likely also reflect co-occurrence statistics of objects in real-world scenes. Here, so-called anchor objects are defined as being highly predictive of the location and identity of frequently co-occuring (usually smaller) objects, derived from object clustering statistics in real-world scenes, while so-called diagnostic objects are predictive of the larger semantic context (i.e., scene category). Across two studies (N = 50, N = 44), we investigate which of these properties underlie scene understanding across two dimensions - realism and categorisation - using scenes generated from Generative Adversarial Networks (GANs) which naturally vary along these dimensions. We show that anchor objects and mainly high-level features extracted from a range of pre-trained deep neural networks (DNNs) drove realism both at first glance and after initial processing. Categorisation performance was mainly determined by diagnostic objects, regardless of realism, at first glance and after initial processing. Our results are testament to the visual system's ability to pick up on reliable, category specific sources of information that are flexible towards disturbances across the visual feature-hierarchy.

References

Kriegeskorte N, Mur M, Bandettini P . Representational similarity analysis - connecting the branches of systems neuroscience. Front Syst Neurosci. 2008; 2:4. PMC: 2605405. DOI: 10.3389/neuro.06.004.2008. View

Greene M, Hansen B . Disentangling the Independent Contributions of Visual and Conceptual Features to the Spatiotemporal Dynamics of Scene Categorization. J Neurosci. 2020; 40(27):5283-5299. PMC: 7329300. DOI: 10.1523/JNEUROSCI.2088-19.2020. View

Greene M . Statistics of high-level scene context. Front Psychol. 2013; 4:777. PMC: 3810604. DOI: 10.3389/fpsyg.2013.00777. View

Wyatte D, Curran T, OReilly R . The limits of feedforward vision: recurrent processing promotes robust object recognition when objects are degraded. J Cogn Neurosci. 2012; 24(11):2248-61. DOI: 10.1162/jocn_a_00282. View

Swets J . Indices of discrimination or diagnostic accuracy: their ROCs and implied models. Psychol Bull. 1986; 99(1):100-17. View

Potter M, Faulconer B . Time to understand pictures and words. Nature. 1975; 253(5491):437-8. DOI: 10.1038/253437a0. View

Doerig A, Sommers R, Seeliger K, Richards B, Ismael J, Lindsay G . The neuroconnectionist research programme. Nat Rev Neurosci. 2023; 24(7):431-450. DOI: 10.1038/s41583-023-00705-w. View

Kaiser D, Haberle G, Cichy R . Real-world structure facilitates the rapid emergence of scene category information in visual brain signals. J Neurophysiol. 2020; 124(1):145-151. PMC: 7474449. DOI: 10.1152/jn.00164.2020. View

Greene M, Oliva A . The briefest of glances: the time course of natural scene understanding. Psychol Sci. 2009; 20(4):464-72. PMC: 2742770. DOI: 10.1111/j.1467-9280.2009.02316.x. View

10.

Bowers J, Malhotra G, Dujmovic M, Llera Montero M, Tsvetkov C, Biscione V . Deep problems with neural network models of human vision. Behav Brain Sci. 2022; 46:e385. DOI: 10.1017/S0140525X22002813. View

11.

Bonner M, Epstein R . Object representations in the human brain reflect the co-occurrence statistics of vision and language. Nat Commun. 2021; 12(1):4081. PMC: 8253839. DOI: 10.1038/s41467-021-24368-2. View

12.

Boettcher S, Draschkow D, Dienhart E, Vo M . Anchoring visual search in scenes: Assessing the role of anchor objects on eye movements during visual search. J Vis. 2018; 18(13):11. DOI: 10.1167/18.13.11. View

13.

Vo M, Boettcher S, Draschkow D . Reading scenes: how scene grammar guides attention and aids perception in real-world environments. Curr Opin Psychol. 2019; 29:205-210. DOI: 10.1016/j.copsyc.2019.03.009. View

14.

Peirce J, Gray J, Simpson S, Macaskill M, Hochenberger R, Sogo H . PsychoPy2: Experiments in behavior made easy. Behav Res Methods. 2019; 51(1):195-203. PMC: 6420413. DOI: 10.3758/s13428-018-01193-y. View

15.

Barr D, Levy R, Scheepers C, Tily H . Random effects structure for confirmatory hypothesis testing: Keep it maximal. J Mem Lang. 2014; 68(3). PMC: 3881361. DOI: 10.1016/j.jml.2012.11.001. View

16.

Guclu U, van Gerven M . Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream. J Neurosci. 2015; 35(27):10005-14. PMC: 6605414. DOI: 10.1523/JNEUROSCI.5023-14.2015. View

17.

Jozwik K, Kietzmann T, Cichy R, Kriegeskorte N, Mur M . Deep Neural Networks and Visuo-Semantic Models Explain Complementary Components of Human Ventral-Stream Representational Dynamics. J Neurosci. 2023; 43(10):1731-1741. PMC: 10010451. DOI: 10.1523/JNEUROSCI.1424-22.2022. View

18.

Davenport J, Potter M . Scene consistency in object and background perception. Psychol Sci. 2004; 15(8):559-64. DOI: 10.1111/j.0956-7976.2004.00719.x. View

19.

Draschkow D, Vo M . Scene grammar shapes the way we interact with objects, strengthens memories, and speeds search. Sci Rep. 2017; 7(1):16471. PMC: 5705766. DOI: 10.1038/s41598-017-16739-x. View

20.

Oliva A, Torralba A . Building the gist of a scene: the role of global image features in recognition. Prog Brain Res. 2006; 155:23-36. DOI: 10.1016/S0079-6123(06)55002-2. View