FREQUENCY ENHANCEMENT NETWORK FOR EFFICIENT COMPRESSED VIDEO ACTION RECOGNITION

Yue Ming, Lu Xiong, Xia Jia, Qingfang Zheng, Jiangwan Zhou, Fan Feng, Nannan Hu

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Poster 10 Oct 2023

The existing frequency-based action recognition methods achieve impressive performance in improving efficiency. However, they ignore the low-frequency texture and edge clues, leading to accuracy degradation. To address this problem, we propose a novel frequency enhancement (FE) block for efficient compressed video action recognition, including a temporal-channel two-heads attention (TCTHA) module and a frequency overlapping group convolution (FOGC) module. First, the TCTHA module emphasizes the inter-frame temporal context and the inner-frame informative frequency semantics by attention.Then, the FOGC module groups channels in different frequency bands with overlap, to extract low-frequency texture and edge clues, while maintaining the interaction of groups. We integrate the FE block into 2D-CNNs with frequency I-frame input, termed FENet, focusing on the pivotal low-frequency spatio-temporal semantics for action recognition. Experiments on HMDB-51, UCF-101, Kinetics-400, and Kinetics-700 verify that our FENet achieves comparable accuracy compared with RGB-based methods with high efficiency.

Tags:

action recognition

frequency domain

compressed videos

FREQUENCY ENHANCEMENT NETWORK FOR EFFICIENT COMPRESSED VIDEO ACTION RECOGNITION

Yue Ming, Lu Xiong, Xia Jia, Qingfang Zheng, Jiangwan Zhou, Fan Feng, Nannan Hu

More Like This

M3FPOLYPSEGNET: SEGMENTATION NETWORK WITH MULTI-FREQUENCY FEATURE FUSION FOR POLYP LOCALIZATION IN COLONOSCOPY IMAGES

SKELETON ACTION RECOGNITION BASED ON SPATIO-TEMPORAL FEATURES

FEATURE SPACE DATA AUGMENTATION FOR VIEWPOINT-ROBUST ACTION RECOGNITION IN VIDEOS

Join an IEEE Society