Listen To The Pixels

Sanjoy Chowdhury, Subhrajyoti Dasgupta, Sudip Das, Ujjwal Bhattacharya

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:08:03

22 Sep 2021

Performing sound source separation and visual object segmentation jointly in naturally occurring videos is a notoriously difficult task, especially in the absence of annotated data. In this study, we leverage the concurrency between audio and visual modalities in an attempt to solve the joint audio-visual segmentation problem in a self-supervised manner. Human beings interact with the physical world through a few sensory systems such as vision, auditory, movement, etc. The usefulness of the interplay of such systems lies in the concept of degeneracy. It tells us that the cross-modal signals can educate each other without the presence of an external supervisor. In this work, we efficiently exploit this fact that learning from one modality inherently helps to find patterns in others by introducing a novel audio-visual fusion technique. Also, to the best of our knowledge, we are the first to address the partially occluded sound source segmentation task. Our study shows that the proposed model significantly outperforms existing state-of-the-art methods in both visual and audio source separation tasks.

Tags:

signal processing society

IEEE icip 2021

september 19-22

virtual conference

2021

sps

virtual conference icip 2021

icip 2021

Listen To The Pixels

Sanjoy Chowdhury, Subhrajyoti Dasgupta, Sudip Das, Ujjwal Bhattacharya

Value-Added Bundle(s) Including this Product

ICIP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Keynote: Navigating the Transition to Sustainable Energy Solutions in a Power-Hungry World

Panel: Leveraging Technology to Achieve Carbon Neutrality of Buildings and Factories

Panel: Charting the Course for Future-Ready Data Centers in the Era of Sustainability

Join an IEEE Society