» Articles » PMID: 37033413

3D Network with Channel Excitation and Knowledge Distillation for Action Recognition

Overview
Date 2023 Apr 10
PMID 37033413
Authors
Affiliations
Soon will be listed here.
Abstract

Modern action recognition techniques frequently employ two networks: the spatial stream, which accepts input from RGB frames, and the temporal stream, which accepts input from optical flow. Recent researches use 3D convolutional neural networks that employ spatiotemporal filters on both streams. Although mixing flow with RGB enhances performance, correct optical flow computation is expensive and adds delay to action recognition. In this study, we present a method for training a 3D CNN using RGB frames that replicates the motion stream and, as a result, does not require flow calculation during testing. To begin, in contrast to the SE block, we suggest a channel excitation module (CE module). Experiments have shown that the CE module can improve the feature extraction capabilities of a 3D network and that the effect is superior to the SE block. Second, for action recognition training, we adopt a linear mix of loss based on knowledge distillation and standard cross-entropy loss to effectively leverage appearance and motion information. The Intensified Motion RGB Stream is the stream trained with this combined loss (IMRS). IMRS surpasses RGB or Flow as a single stream; for example, HMDB51 achieves 73.5% accuracy, while RGB and Flow streams score 65.6% and 69.1% accuracy, respectively. Extensive experiments confirm the effectiveness of our proposed method. The comparison with other models proves that our model has good competitiveness in behavior recognition.

Citing Articles

TL-CStrans Net: a vision robot for table tennis player action recognition driven via CS-Transformer.

Ma L, Tong Y Front Neurorobot. 2024; 18:1443177.

PMID: 39498235 PMC: 11532032. DOI: 10.3389/fnbot.2024.1443177.


DILS: depth incremental learning strategy.

Wang Y, Han Z, Yu S, Zhang S, Liu B, Fan H Front Neurorobot. 2024; 17:1337130.

PMID: 38260719 PMC: 10800709. DOI: 10.3389/fnbot.2023.1337130.

References
1.
Ji S, Yang M, Yu K . 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell. 2012; 35(1):221-31. DOI: 10.1109/TPAMI.2012.59. View

2.
Dessalene E, Devaraj C, Maynord M, Fermuller C, Aloimonos Y . Forecasting Action Through Contact Representations From First Person Video. IEEE Trans Pattern Anal Mach Intell. 2021; 45(6):6703-6714. DOI: 10.1109/TPAMI.2021.3055233. View

3.
Donahue J, Hendricks L, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K . Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. IEEE Trans Pattern Anal Mach Intell. 2016; 39(4):677-691. DOI: 10.1109/TPAMI.2016.2599174. View

4.
Hu J, Shen L, Albanie S, Sun G, Wu E . Squeeze-and-Excitation Networks. IEEE Trans Pattern Anal Mach Intell. 2019; 42(8):2011-2023. DOI: 10.1109/TPAMI.2019.2913372. View

5.
Pock T, Urschler M, Zach C, Beichel R, Bischof H . A duality based algorithm for TV-L1-optical-flow image registration. Med Image Comput Comput Assist Interv. 2007; 10(Pt 2):511-8. DOI: 10.1007/978-3-540-75759-7_62. View