LOOK, LISTEN AND PAY MORE ATTENTION: FUSING MULTI-MODAL INFORMATION FOR VIDEO VIOLENCE DETECTION

Dong-Lai Wei, Yang Liu, Jing Liu, Xin-Hua Zeng, Chen-Geng Liu, Xiao-Guang Zhu

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:07:49

09 May 2022

Violence detection is an essential and challenging problem in the computer vision community. Most existing works focus on single modal data analysis, which is not effective when multi-modality is available. Therefore, we propose a two-stage multi-modal information fusion method for violence detection: 1) the first stage adopts multiple instance learning strategies to refine video-level hard labels into clip-level soft labels, and 2) the next stage uses multi-modal information fused attention module to achieve fusion, and supervised learning is carried out using the soft labels generated at the first stage. Extensive empirical evidence on the XD-Violence dataset shows that our method outperforms the state-of-the-art methods.

Tags:

fused attention

multi-modal information

deep learning

weak supervision

violence detection

LOOK, LISTEN AND PAY MORE ATTENTION: FUSING MULTI-MODAL INFORMATION FOR VIDEO VIOLENCE DETECTION

Dong-Lai Wei, Yang Liu, Jing Liu, Xin-Hua Zeng, Chen-Geng Liu, Xiao-Guang Zhu

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Invertible Neural Networks and their Applications

Slides: Invertible Neural Networks and their Applications

KEYNOTE: Keras, A shortcut to master AI

Join an IEEE Society