» Articles » PMID: 39361707

Probing the Link Between Vision and Language in Material Perception Using Psychophysics and Unsupervised Learning

Overview
Specialty Biology
Date 2024 Oct 3
PMID 39361707
Authors
Affiliations
Soon will be listed here.
Abstract

We can visually discriminate and recognize a wide range of materials. Meanwhile, we use language to describe what we see and communicate relevant information about the materials. Here, we investigate the relationship between visual judgment and language expression to understand how visual features relate to semantic representations in human cognition. We use deep generative models to generate images of realistic materials. Interpolating between the generative models enables us to systematically create material appearances in both well-defined and ambiguous categories. Using these stimuli, we compared the representations of materials from two behavioral tasks: visual material similarity judgments and free-form verbal descriptions. Our findings reveal a moderate but significant correlation between vision and language on a categorical level. However, analyzing the representations with an unsupervised alignment method, we discover structural differences that arise at the image-to-image level, especially among ambiguous materials morphed between known categories. Moreover, visual judgments exhibit more individual differences compared to verbal descriptions. Our results show that while verbal descriptions capture material qualities on the coarse level, they may not fully convey the visual nuances of material appearances. Analyzing the image representation of materials obtained from various pre-trained deep neural networks, we find that similarity structures in human visual judgments align more closely with those of the vision-language models than purely vision-based models. Our work illustrates the need to consider the vision-language relationship in building a comprehensive model for material perception. Moreover, we propose a novel framework for evaluating the alignment and misalignment between representations from different modalities, leveraging information from human behaviors and computational models.

References
1.
Di Stefano N, Spence C . Roughness perception: A multisensory/crossmodal perspective. Atten Percept Psychophys. 2022; 84(7):2087-2114. PMC: 9481510. DOI: 10.3758/s13414-022-02550-y. View

2.
Fleming R, Jakel F, Maloney L . Visual perception of thick transparent materials. Psychol Sci. 2011; 22(6):812-20. DOI: 10.1177/0956797611408734. View

3.
Witzel C, Gegenfurtner K . Categorical perception for red and brown. J Exp Psychol Hum Percept Perform. 2015; 42(4):540-570. DOI: 10.1037/xhp0000154. View

4.
Komatsu H, Goda N . Neural Mechanisms of Material Perception: Quest on Shitsukan. Neuroscience. 2018; 392:329-347. DOI: 10.1016/j.neuroscience.2018.09.001. View

5.
Hiramatsu C, Goda N, Komatsu H . Transformation from image-based to perceptual representation of materials along the human ventral visual pathway. Neuroimage. 2011; 57(2):482-94. DOI: 10.1016/j.neuroimage.2011.04.056. View