Do Humans and Deep Convolutional Neural Networks Use Visual Information Similarly for the Categorization of Natural Scenes?

Overview

Journal Cogn Sci

Specialty Psychology

Date 2021 Jun 25

PMID 34170027

Citations 4

Authors

Andrea De Cesarei

Shari Cavicchi

Giampaolo Cristadoro

Marco Lippi

Affiliations

Soon will be listed here.

Abstract

The investigation of visual categorization has recently been aided by the introduction of deep convolutional neural networks (CNNs), which achieve unprecedented accuracy in picture classification after extensive training. Even if the architecture of CNNs is inspired by the organization of the visual brain, the similarity between CNN and human visual processing remains unclear. Here, we investigated this issue by engaging humans and CNNs in a two-class visual categorization task. To this end, pictures containing animals or vehicles were modified to contain only low/high spatial frequency (HSF) information, or were scrambled in the phase of the spatial frequency spectrum. For all types of degradation, accuracy increased as degradation was reduced for both humans and CNNs; however, the thresholds for accurate categorization varied between humans and CNNs. More remarkable differences were observed for HSF information compared to the other two types of degradation, both in terms of overall accuracy and image-level agreement between humans and CNNs. The difficulty with which the CNNs were shown to categorize high-passed natural scenes was reduced by picture whitening, a procedure which is inspired by how visual systems process natural images. The results are discussed concerning the adaptation to regularities in the visual environment (scene statistics); if the visual characteristics of the environment are not learned by CNNs, their visual categorization may depend only on a subset of the visual information on which humans rely, for example, on low spatial frequency information.

Citing Articles

Novel applications of Convolutional Neural Networks in the age of Transformers.

Ersavas T, Smith M, Mattick J Sci Rep. 2024; 14(1):10000.

PMID: 38693215 PMC: 11063149. DOI: 10.1038/s41598-024-60709-z.

Automatic identification of stone-handling behaviour in Japanese macaques using LabGym artificial intelligence.

Ardoin T, Sueur C Primates. 2024; 65(3):159-172.

PMID: 38520479 DOI: 10.1007/s10329-024-01123-x.

Evaluation of Structural Retinal Layer Alterations in Retinitis Pigmentosa.

Yavuzer K, Citirik M, Yavuzer B Rom J Ophthalmol. 2024; 67(4):326-336.

PMID: 38239428 PMC: 10793365. DOI: 10.22336/rjo.2023.53.

Luminance and timing control during visual presentation of natural scenes.

De Cesarei A, Marzocchi M, Codispoti M HardwareX. 2022; 12:e00376.

PMID: 36437839 PMC: 9682347. DOI: 10.1016/j.ohx.2022.e00376.

References

Cadieu C, Hong H, Yamins D, Pinto N, Ardila D, Solomon E . Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLoS Comput Biol. 2014; 10(12):e1003963. PMC: 4270441. DOI: 10.1371/journal.pcbi.1003963. View

Yamins D, DiCarlo J . Using goal-driven deep learning models to understand sensory cortex. Nat Neurosci. 2016; 19(3):356-65. DOI: 10.1038/nn.4244. View

Guclu U, van Gerven M . Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream. J Neurosci. 2015; 35(27):10005-14. PMC: 6605414. DOI: 10.1523/JNEUROSCI.5023-14.2015. View

De Cesarei A, Codispoti M, Schupp H, Stegagno L . Selectively attending to natural scenes after alcohol consumption: an ERP analysis. Biol Psychol. 2005; 72(1):35-45. DOI: 10.1016/j.biopsycho.2005.06.009. View

Torralba A, Oliva A . Statistics of natural image categories. Network. 2003; 14(3):391-412. View

De Cesarei A, Codispoti M . Scene identification and emotional response: which spatial frequencies are critical?. J Neurosci. 2011; 31(47):17052-7. PMC: 6623837. DOI: 10.1523/JNEUROSCI.3745-11.2011. View

Vuilleumier P, Armony J, Driver J, Dolan R . Distinct spatial frequency sensitivities for processing faces and emotional expressions. Nat Neurosci. 2003; 6(6):624-31. DOI: 10.1038/nn1057. View

Berman D, Golomb J, Walther D . Scene content is predominantly conveyed by high spatial frequencies in scene-selective visual cortex. PLoS One. 2017; 12(12):e0189828. PMC: 5741213. DOI: 10.1371/journal.pone.0189828. View

De Cesarei A, Cavicchi S, Micucci A, Codispoti M . Categorization Goals Modulate the Use of Natural Scene Statistics. J Cogn Neurosci. 2018; 31(1):109-125. DOI: 10.1162/jocn_a_01333. View

10.

Pessoa L, Adolphs R . Emotion processing and the amygdala: from a 'low road' to 'many roads' of evaluating biological significance. Nat Rev Neurosci. 2010; 11(11):773-83. PMC: 3025529. DOI: 10.1038/nrn2920. View

11.

Rajalingham R, Issa E, Bashivan P, Kar K, Schmidt K, DiCarlo J . Large-Scale, High-Resolution Comparison of the Core Visual Object Recognition Behavior of Humans, Monkeys, and State-of-the-Art Deep Artificial Neural Networks. J Neurosci. 2018; 38(33):7255-7269. PMC: 6096043. DOI: 10.1523/JNEUROSCI.0388-18.2018. View

12.

Wardle S, Baker C . Recent advances in understanding object recognition in the human brain: deep neural networks, temporal dynamics, and context. F1000Res. 2020; 9. PMC: 7291077. DOI: 10.12688/f1000research.22296.1. View

13.

Joubert O, Rousselet G, Fabre-Thorpe M, Fize D . Rapid visual categorization of natural scene contexts with equalized amplitude spectrum and increasing phase noise. J Vis. 2009; 9(1):2.1-16. DOI: 10.1167/9.1.2. View

14.

Groen I, Greene M, Baldassano C, Fei-Fei L, Beck D, Baker C . Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior. Elife. 2018; 7. PMC: 5860866. DOI: 10.7554/eLife.32962. View

15.

Zhong-Qiu Zhao , Zheng P, Xu S, Wu X . Object Detection With Deep Learning: A Review. IEEE Trans Neural Netw Learn Syst. 2019; 30(11):3212-3232. DOI: 10.1109/TNNLS.2018.2876865. View

16.

Atick J . Could information theory provide an ecological theory of sensory processing?. Network. 2011; 22(1-4):4-44. DOI: 10.3109/0954898X.2011.638888. View

17.

Khaligh-Razavi S, Kriegeskorte N . Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Comput Biol. 2014; 10(11):e1003915. PMC: 4222664. DOI: 10.1371/journal.pcbi.1003915. View

18.

Vanrullen R, Thorpe S . The time course of visual processing: from early perception to decision-making. J Cogn Neurosci. 2001; 13(4):454-61. DOI: 10.1162/08989290152001880. View

19.

Wichmann F, Hill N . The psychometric function: II. Bootstrap-based confidence intervals and sampling. Percept Psychophys. 2002; 63(8):1314-29. DOI: 10.3758/bf03194545. View

20.

Groen I, Silson E, Baker C . Contributions of low- and high-level properties to neural processing of visual scenes in the human brain. Philos Trans R Soc Lond B Biol Sci. 2017; 372(1714). PMC: 5206270. DOI: 10.1098/rstb.2016.0102. View