ASNet: Auto-Augmented Siamese Neural Network for Action Recognition

Overview

Journal Sensors (Basel)

Publisher MDPI

Specialty Biotechnology

Date 2021 Jul 24

PMID 34300460

Citations 2

Authors

Yujia Zhang

Lai-Man Po

Jingjing Xiong

Yasar Abbas Ur Rehman

Kwok-Wai Cheung

Affiliations

Soon will be listed here.

Abstract

Human action recognition methods in videos based on deep convolutional neural networks usually use random cropping or its variants for data augmentation. However, this traditional data augmentation approach may generate many non-informative samples (video patches covering only a small part of the foreground or only the background) that are not related to a specific action. These samples can be regarded as noisy samples with incorrect labels, which reduces the overall action recognition performance. In this paper, we attempt to mitigate the impact of noisy samples by proposing an Auto-augmented Siamese Neural Network (ASNet). In this framework, we propose backpropagating salient patches and randomly cropped samples in the same iteration to perform gradient compensation to alleviate the adverse gradient effects of non-informative samples. Salient patches refer to the samples containing critical information for human action recognition. The generation of salient patches is formulated as a Markov decision process, and a reinforcement learning agent called SPA (Salient Patch Agent) is introduced to extract patches in a weakly supervised manner without extra labels. Extensive experiments were conducted on two well-known datasets UCF-101 and HMDB-51 to verify the effectiveness of the proposed SPA and ASNet.

Citing Articles

Action recognition using attention-based spatio-temporal VLAD networks and adaptive video sequences optimization.

Weng Z, Li X, Xiong S Sci Rep. 2024; 14(1):26202.

PMID: 39482337 PMC: 11527889. DOI: 10.1038/s41598-024-75640-6.

MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module.

Zhang Y Sensors (Basel). 2022; 22(17).

PMID: 36081054 PMC: 9460449. DOI: 10.3390/s22176595.

References

Du W, Wang Y, Qiao Y . Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos. IEEE Trans Image Process. 2018; 27(3):1347-1360. DOI: 10.1109/TIP.2017.2778563. View

Weng Z, Jin Z, Chen S, Shen Q, Ren X, Li W . Attention-Based Temporal Encoding Network with Background-Independent Motion Mask for Action Recognition. Comput Intell Neurosci. 2021; 2021:8890808. PMC: 8024088. DOI: 10.1155/2021/8890808. View

Ji S, Yang M, Yu K . 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell. 2012; 35(1):221-31. DOI: 10.1109/TPAMI.2012.59. View

Zuo Q, Zou L, Fan C, Li D, Jiang H, Liu Y . Whole and Part Adaptive Fusion Graph Convolutional Networks for Skeleton-Based Action Recognition. Sensors (Basel). 2020; 20(24). PMC: 7763937. DOI: 10.3390/s20247149. View

Song H, Kim M, Park D, Shin Y, Lee J . Learning From Noisy Labels With Deep Neural Networks: A Survey. IEEE Trans Neural Netw Learn Syst. 2022; 34(11):8135-8153. DOI: 10.1109/TNNLS.2022.3152527. View