Deep Audio-Visual Speech Separation With Attention Mechanism

Chenda Li, Yanmin Qian

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 12:47

04 May 2020

Previous work shows that audio-visual fusion is a practical approach to deal with the speech separation task in the cocktail party problem. In this paper, we explore a better strategy to utilize visual representations with the attention mechanism. Compared to the previous baseline only using one visual stream of the target speaker, both speaker-dependent visual streams in the mixed audio are fed into the model, and it also predicts two separated speech streams simultaneously. To further enhance the performance, the attention mechanism is designed on the audio-visual speech separation architecture. The results show that the proposed approach works well in audio-visual speech separation. Our best model achieves an obvious and consistent improvement in speech separation when compared to the traditional method only using the target speaker visual stream.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Deep Audio-Visual Speech Separation With Attention Mechanism

Chenda Li, Yanmin Qian

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join an IEEE Society