» Articles » PMID: 22868572

Quantitative Analysis of Human-model Agreement in Visual Saliency Modeling: a Comparative Study

Overview
Date 2012 Aug 8
PMID 22868572
Citations 68
Authors
Affiliations
Soon will be listed here.
Abstract

Visual attention is a process that enables biological and machine vision systems to select the most relevant regions from a scene. Relevance is determined by two components: 1) top-down factors driven by task and 2) bottom-up factors that highlight image regions that are different from their surroundings. The latter are often referred to as "visual saliency." Modeling bottom-up visual saliency has been the subject of numerous research efforts during the past 20 years, with many successful applications in computer vision and robotics. Available models have been tested with different datasets (e.g., synthetic psychological search arrays, natural images or videos) using different evaluation scores (e.g., search slopes, comparison to human eye tracking) and parameter settings. This has made direct comparison of models difficult. Here, we perform an exhaustive comparison of 35 state-of-the-art saliency models over 54 challenging synthetic patterns, three natural image datasets, and two video datasets, using three evaluation scores. We find that although model rankings vary, some models consistently perform better. Analysis of datasets reveals that existing datasets are highly center-biased, which influences some of the evaluation scores. Computational complexity analysis shows that some models are very fast, yet yield competitive eye movement prediction accuracy. Different models often have common easy/difficult stimuli. Furthermore, several concerns in visual saliency modeling, eye movement datasets, and evaluation scores are discussed and insights for future work are provided. Our study allows one to assess the state-of-the-art, helps to organizing this rapidly growing field, and sets a unified comparison framework for gauging future efforts, similar to the PASCAL VOC challenge in the object recognition and detection domains.

Citing Articles

The Visual Integration of Semantic and Spatial Information of Objects in Naturalistic Scenes (VISIONS) database: attentional, conceptual, and perceptual norms.

Allegretti E, DInnocenzo G, Coco M Behav Res Methods. 2025; 57(1):42.

PMID: 39753746 DOI: 10.3758/s13428-024-02535-9.


Investigating causal effects of pupil size on visual discrimination and visually evoked potentials in an optotype discrimination task.

Chin H, Tai Y, Yep R, Chang Y, Hsu C, Wang C Front Neurosci. 2024; 18:1412527.

PMID: 39411147 PMC: 11473405. DOI: 10.3389/fnins.2024.1412527.


Distinct eye movement patterns to complex scenes in Alzheimer's disease and Lewy body disease.

Yamada Y, Shinkawa K, Kobayashi M, Nemoto M, Ota M, Nemoto K Front Neurosci. 2024; 18:1333894.

PMID: 38646608 PMC: 11026598. DOI: 10.3389/fnins.2024.1333894.


Human attention during goal-directed reading comprehension relies on task optimization.

Zou J, Zhang Y, Li J, Tian X, Ding N Elife. 2023; 12.

PMID: 38032825 PMC: 10688971. DOI: 10.7554/eLife.87197.


Nonlocal contrast calculated by the second order visual mechanisms and its significance in identifying facial emotions.

Babenko V, Yavna D, Ermakov P, Anokhina P F1000Res. 2023; 10:274.

PMID: 37767361 PMC: 10521119. DOI: 10.12688/f1000research.28396.2.