Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:16:46
18 Oct 2022

Holistic understanding of videos requires the recognition of the overall scene beyond detecting foreground activity and objects. It provides valuable information for various video understanding tasks such as video summarization, scene change detection and content filtering. While significant effort has been put into developing models for scene classification in images (e.g. Places365), video-level scene recognition is relatively nascent. The scope of this paper is to address this problem of going from image representations to video for scene classification. in particular, we compare self-supervised deep learning methods on video scene recognition task using the HVU dataset. Starting from strong image level scene representations, with triplets based contrastive loss, we train a video-level scene classifier. We propose triplet sampling strategies that aid the self-supervision. We compare the self-supervised techniques against the image level scene representations, as well as a weakly supervised classifier trained on image labels. We observe that the models learned using self-supervised method outperform both baselines (with statistical significance), showing that we are able to retain the representative power of the video-level scene representations compared to a competitive image-level scene recognition model trained on Places365, while showing benefits over weakly supervised techniques.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00