A Human Activity Recognition Method Based on Vision Transformer

Overview

Journal Sci Rep

Specialty Science

Date 2024 Jul 3

PMID 38961136

Authors

Huiyan Han

Hongwei Zeng

Liqun Kuang

Xie Han

Hongxin Xue

Affiliations

Soon will be listed here.

Abstract

Human activity recognition has a wide range of applications in various fields, such as video surveillance, virtual reality and human-computer intelligent interaction. It has emerged as a significant research area in computer vision. GCN (Graph Convolutional networks) have recently been widely used in these fields and have made great performance. However, there are still some challenges including over-smoothing problem caused by stack graph convolutions and deficient semantics correlation to capture the large movements between time sequences. Vision Transformer (ViT) is utilized in many 2D and 3D image fields and has surprised results. In our work, we propose a novel human activity recognition method based on ViT (HAR-ViT). We integrate enhanced AGCL (eAGCL) in 2s-AGCN to ViT to make it process spatio-temporal data (3D skeleton) and make full use of spatial features. The position encoder module orders the non-sequenced information while the transformer encoder efficiently compresses sequence data features to enhance calculation speed. Human activity recognition is accomplished through multi-layer perceptron (MLP) classifier. Experimental results demonstrate that the proposed method achieves SOTA performance on three extensively used datasets, NTU RGB+D 60, NTU RGB+D 120 and Kinetics-Skeleton 400.

References

Liu Y, Zhang H, Li Y, He K, Xu D . Skeleton-based Human Action Recognition via Large-kernel Attention Graph Convolutional Network. IEEE Trans Vis Comput Graph. 2023; PP. DOI: 10.1109/TVCG.2023.3247075. View

Wang J, Liu Z, Wu Y, Yuan J . Learning Actionlet Ensemble for 3D Human Action Recognition. IEEE Trans Pattern Anal Mach Intell. 2015; 36(5):914-27. DOI: 10.1109/TPAMI.2013.198. View

Shi L, Zhang Y, Cheng J, Lu H . Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks. IEEE Trans Image Process. 2020; PP. DOI: 10.1109/TIP.2020.3028207. View

Myung W, Su N, Xue J, Wang G . DeGCN: Deformable Graph Convolutional Networks for Skeleton-Based Action Recognition. IEEE Trans Image Process. 2024; 33:2477-2490. DOI: 10.1109/TIP.2024.3378886. View

Liu J, Wang G, Duan L, Abdiyeva K, Kot A . Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks. IEEE Trans Image Process. 2018; 27(4):1586-1599. DOI: 10.1109/TIP.2017.2785279. View