Scene Categorization by Hessian-regularized Active Perceptual Feature Selection

Overview

Journal Sci Rep

Specialty Science

Date 2025 Jan 3

PMID 39753661

Authors

Junwu Zhou

Fuji Ren

Affiliations

Soon will be listed here.

Abstract

Decoding the semantic categories of complex sceneries is fundamental to numerous artificial intelligence (AI) infrastructures. This work presents an advanced selection of multi-channel perceptual visual features for recognizing scenic images with elaborate spatial structures, focusing on developing a deep hierarchical model dedicated to learning human gaze behavior. Utilizing the BING objectness measure, we efficiently localize objects or their details across varying scales within scenes. To emulate humans observing semantically or visually significant areas within scenes, we propose a robust deep active learning (RDAL) strategy. This strategy progressively generates gaze shifting paths (GSP) and calculates deep GSP representations within a unified architecture. A notable advantage of RDAL is the robustness to label noise, which is implemented by a carefully-designed sparse penalty term. This mechanism ensures that irrelevant or misleading deep GSP features are intelligently discarded. Afterward, a novel Hessian-regularized Feature Selector (HFS) is proposed to select high-quality features from the deep GSP features, wherein (i) the spatial composition of scenic patches can be optimally maintained, and (ii) a linear SVM is learned simultaneously. Empirical evaluations across six standard scenic datasets demonstrated our method's superior performance, highlighting its exceptional ability to differentiate various sophisticated scenery categories.

References

Wang W, Shen J, Dong X, Borji A, Yang R . Inferring Salient Objects from Human Fixations. IEEE Trans Pattern Anal Mach Intell. 2019; 42(8):1913-1927. DOI: 10.1109/TPAMI.2019.2905607. View

Yuan Y, Mou L, Lu X . Scene recognition by manifold regularized deep learning architecture. IEEE Trans Neural Netw Learn Syst. 2015; 26(10):2222-33. DOI: 10.1109/TNNLS.2014.2359471. View

Wang W, Sun G, Gool L . Looking Beyond Single Images for Weakly Supervised Semantic Segmentation Learning. IEEE Trans Pattern Anal Mach Intell. 2022; 46(3):1635-1649. DOI: 10.1109/TPAMI.2022.3168530. View

Lu X, Li X, Mou L . Semi-Supervised Multitask Learning for Scene Recognition. IEEE Trans Cybern. 2014; 45(9):1967-76. DOI: 10.1109/TCYB.2014.2362959. View

Pont-Tuset J, Arbelaez P, Barron J, Marques F, Malik J . Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation. IEEE Trans Pattern Anal Mach Intell. 2016; 39(1):128-140. DOI: 10.1109/TPAMI.2016.2537320. View

Wang W, Shen J . Deep Visual Attention Prediction. IEEE Trans Image Process. 2018; 27(5):2368-2378. DOI: 10.1109/TIP.2017.2787612. View

Zheng Z, Zhong Y, Wang J, Ma A, Zhang L . FarSeg++: Foreground-Aware Relation Network for Geospatial Object Segmentation in High Spatial Resolution Remote Sensing Imagery. IEEE Trans Pattern Anal Mach Intell. 2023; 45(11):13715-13729. DOI: 10.1109/TPAMI.2023.3296757. View

Zhang C, Li H, Chen C, Qian Y, Zhou X . Enhanced Group Sparse Regularized Nonconvex Regression for Face Recognition. IEEE Trans Pattern Anal Mach Intell. 2020; 44(5):2438-2452. DOI: 10.1109/TPAMI.2020.3033994. View

Bruce N, Tsotsos J . Saliency, attention, and visual search: an information theoretic approach. J Vis. 2009; 9(3):5.1-24. DOI: 10.1167/9.3.5. View

10.

He K, Zhang X, Ren S, Sun J . Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans Pattern Anal Mach Intell. 2015; 37(9):1904-16. DOI: 10.1109/TPAMI.2015.2389824. View

11.

Li X, Mou L, Lu X . Scene Parsing From an MAP Perspective. IEEE Trans Cybern. 2014; 45(9):1876-86. DOI: 10.1109/TCYB.2014.2361489. View

12.

Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Susstrunk S . SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell. 2012; 34(11):2274-82. DOI: 10.1109/TPAMI.2012.120. View

13.

Wang W, Shen J, Ling H . A Deep Network Solution for Attention and Aesthetics Aware Photo Cropping. IEEE Trans Pattern Anal Mach Intell. 2018; 41(7):1531-1544. DOI: 10.1109/TPAMI.2018.2840724. View

14.

Hadjidemetriou E, Grossberg M, Nayar S . Multiresolution histograms and their use for recognition. IEEE Trans Pattern Anal Mach Intell. 2008; 26(7):831-47. DOI: 10.1109/TPAMI.2004.32. View

15.

Wang W, Shen J, Porikli F, Yang R . Semi-Supervised Video Object Segmentation with Super-Trajectories. IEEE Trans Pattern Anal Mach Intell. 2018; . DOI: 10.1109/TPAMI.2018.2819173. View

16.

Xiao Y, Wu J, Yuan J . mCENTRIST: A Multi-Channel Feature Generation Mechanism for Scene Categorization. IEEE Trans Image Process. 2015; 23(2):823-36. DOI: 10.1109/TIP.2013.2295756. View

17.

Wolfe J, Horowitz T . What attributes guide the deployment of visual attention and how do they do it?. Nat Rev Neurosci. 2004; 5(6):495-501. DOI: 10.1038/nrn1411. View

18.

Cong Y, Liu J, Yuan J, Luo J . Self-supervised online metric learning with low rank constraint for scene categorization. IEEE Trans Image Process. 2013; 22(8):3179-91. DOI: 10.1109/TIP.2013.2260168. View