Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
Poster 11 Oct 2023

In this paper, we introduce 3D-CSL, a compact pipeline for Near-Duplicate Video Retrieval (NDVR), and explore a novel self-supervised learning strategy for video similarity learning. Most previous NDVR methods depend a lot on pair-wise labeled data, so that be limited by the scale of datasets and cannot optimize complex but efficient backbones, e.g., 3D transformers. In order to break this limitation, we explore the self-supervised similarity learning for the NDVR task and propose FCS loss, a novel triplet loss, and ShotMix, a novel video-specific augmentation, which enhances the self-supervised video similarity learning significantly. With this premise, the compact 3D pipeline we propose shows a great advantage in extracting global spatiotemporal dependencies in videos and achieves the best balance between efficiency and effectiveness. Furthermore, we also propose PredMAE to pretrain the 3D transformer with video prediction task as a pretext task to boost the downstream NDVR task without any human labels. The experiments on FIVR-200K and CC_WEB_VIDEO demonstrate the superiority and reliability of our method, which achieves the state-of-the-art performance on clip-level NDVR. Code is released in https://github.com/dun-research/3D-CSL.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • CIS
    Members: Free
    IEEE Members: Free
    Non-members: Free