MULTIMODAL ATTENTION-MECHANISM FOR TEMPORAL EMOTION RECOGNITION
Esam Ghaleb, Jan Niehues, Stylianos Asteriadis
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 15:17
Exploiting the multimodal and temporal interaction between audio-visual channels is essential for automatic audio-video emotion recognition (AVER). Modalities' strength in emotions and time-window of a video-clip could be further utilized through a weighting scheme such as attention mechanism to capture their complementary information. The attention mechanism is a powerful approach for sequence modeling, which can be employed to fuse audio-video cues over-time. We propose a novel framework which consists of bi-audio-visual time-windows that span short video-clips labeled with discrete emotions. Attention is used to weigh these time-windows for multimodal learning and fusion. Experimental results on two datasets show that the proposed methodology can achieve an enhanced multimodal emotion recognition.