KEY-SPARSE TRANSFORMER FOR MULTIMODAL SPEECH EMOTION RECOGNITION

Weidong Chen, Xiaofeng Xing, Xiangmin Xu, Jichen Yang, Jianxin Pang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:09:55

09 May 2022

Speech emotion recognition is a challenging research topic that plays a critical role in human-computer interaction. Multimodal inputs further improve the performance as more emotional information is used. However, existing studies learn all the information in the sample while only a small portion of it is about emotion. The redundant information will become noises and limit the system performance. In this paper, a key-sparse Transformer is proposed for efficient emotion recognition by focusing more on emotion related information. The proposed method is evaluated by IEMOCAP and LSSED. Experimental results show that the proposed method achieves better performance than the state-of-the-art approaches.

Tags:

modality interaction

speech emotion recognition

sparse network

KEY-SPARSE TRANSFORMER FOR MULTIMODAL SPEECH EMOTION RECOGNITION

Weidong Chen, Xiaofeng Xing, Xiangmin Xu, Jichen Yang, Jianxin Pang

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

SELECTIVE MULTI-TASK LEARNING FOR SPEECH EMOTION RECOGNITION USING CORPORA OF DIFFERENT STYLES

FRONTEND ATTRIBUTES DISENTANGLEMENT FOR SPEECH EMOTION RECOGNITION

SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION

Join an IEEE Society