A Conditional Cycle Emotion Gan For Cross Corpus Speech Emotion Recognition

Bo-Hao Su, Chi-Chun Lee

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 0:15:32

19 Jan 2021

Speech emotion recognition (SER) is important in enabling personalized services in our life. It also becomes a prevalent topic of research with its potential in creating a better user experience across many modern speech technologies. However, the highly contextualized scenario and expensive emotion labeling required cause a severe mismatch between already limited-in-scale speech emotional corpora; this hinders the wide adoption of SER. In this work, instead of conventionally learning a common feature space between corpora, we take a novel approach in enhancing the variability of the source (labeled) corpus that is target (unlabeled) data-aware by generating synthetic source domain data using a conditional cycle emotion generative adversarial network (CCEmoGAN). We evaluate our framework in cross corpus emotion recognition tasks and obtain a three classes valence recognition accuracy of 47.56%, 50.11% and activation accuracy of 51.13%, 65.7% when transferring from the IEMOCAP to the CIT dataset, and the IEMOCAP to the MSP-IMPROV dataset respectively. The benefit of increasing target domain-aware variability in the source domain to improve emotion discriminability in cross corpus emotion recognition is further visualized in our augmented data space.

Tags:

sps conference

slt 2021

A Conditional Cycle Emotion Gan For Cross Corpus Speech Emotion Recognition

Bo-Hao Su, Chi-Chun Lee

Value-Added Bundle(s) Including this Product

SLT 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join an IEEE Society