ENHANCING CONTRASTIVE LEARNING WITH TEMPORAL COGNIZANCE FOR AUDIO-VISUAL REPRESENTATION GENERATION

Chandrashekhar Lavania, Shiva Sundaram, Sundararajan Srinivasan, Katrin Kirchhoff

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:10:17

10 May 2022

Audio-visual data allows us to leverage different modalities for downstream tasks. The idea being individual streams can complement each other in the given task, thereby resulting in a model with improved performance. In this work, we present our experimental results on action recognition and video summarization tasks. The proposed modeling approach builds upon the recent advances in contrastive loss based audio-visual representation learning. Temporally cognizant audio-visual discrimination is achieved in a Transformer model by learning with a masked feature reconstruction loss over a fixed time window in addition to learning via contrastive loss. Overall, our results indicate that the addition of temporal information significantly improved the performance of the contrastive loss based framework. We achieve an action classification accuracy of 66.2% versus the next best baseline at 64.7% on the HMDB dataset. For video summarization, we attain an F1 score of 43.5 verses 42.2 on the SumMe dataset.

Tags:

video summarization

action recognition

contrastive loss

epresentation learning

transformers

ENHANCING CONTRASTIVE LEARNING WITH TEMPORAL COGNIZANCE FOR AUDIO-VISUAL REPRESENTATION GENERATION

Chandrashekhar Lavania, Shiva Sundaram, Sundararajan Srinivasan, Katrin Kirchhoff

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Short Course Bundle: ICASSP 2022 COURSE 6: Transformer Architectures for Multimodal Signal Processing and Decision Making (Parts 1-3)

IEEE PES Corporate Engagement Program and Technical Committees Informational Session - Part 1

Tutorial: Fundamentals of Transformers: A Signal-processing View

Join an IEEE Society