MHSCNet: A Multimodal Hierarchical Shot-aware Convolutional Network for Video Summarization

Wujiang Xu (Xi'an Jiaotong University); Runzhong Wang (Shanghai Jiao Tong University); xiaobo guo (antgroup); Shaoshuai Li (Ant Group); Qiongxu Ma (Ant Group); Yunan Zhao (Ant Group); Sheng Guo (Ant Group); Zhenfeng Zhu (bjtu); Junchi Yan (Shanghai Jiao Tong University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Video summarization is an essential problem in signal processing, which intends to produce a concise summary of the original video. Existing video summarization approaches regard the task as a keyframe selection problem and generally construct the frame-wise representation by combining the long-range temporal dependency with either unimodal or bimodal information. The optimal keyframe should offer the semantic summarization of the whole content by exploiting the multimodal and shot-level hierarchical natures of videos, however, such natures are not fully exploited in existing methods. In this paper, we propose to construct a more powerful and robust frame-wise representation and predict the frame-level importance score in a fair and comprehensive manner. Specifically, we propose a multimodal hierarchical shot-aware convolutional network, denoted as MHSCNet, to enhance the frame-wise representation via combining the comprehensive available multimodal information. We further design a hierarchical ShotConv network to incorporate the adaptive shot-aware frame-level representation by considering the short-range and long-range temporal dependencies. Based on the learned shot-aware representations, MHSCNet can predict the frame-level importance score in the local and global view of the video. Extensive experiments on two standard video summarization datasets demonstrate that our proposed method consistently outperforms state-of-the-arts.

Tags:

Machine learning for image processing

MHSCNet: A Multimodal Hierarchical Shot-aware Convolutional Network for Video Summarization

Wujiang Xu (Xi'an Jiaotong University); Runzhong Wang (Shanghai Jiao Tong University); xiaobo guo (antgroup); Shaoshuai Li (Ant Group); Qiongxu Ma (Ant Group); Yunan Zhao (Ant Group); Sheng Guo (Ant Group); Zhenfeng Zhu (bjtu); Junchi Yan (Shanghai Jiao Tong University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Learning Generalizable Light Field Networks from Few Images

M2TSR: Multi-range and Mix-grained Transformer for Single Image Super-Resolution

Multistage Spatial Context Models for Learned Image Compression

Join an IEEE Society