Can the Production and Perception of Human Emotions Inspire Speech-Based Affective Computing?
Carlos Busso
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 01:02:36
Emotions play an important role in human-human interactions, influencing our decision-making processes, the manner in which we express ourselves, and how our interlocutors respond to us. Therefore, it is important to advance affective computing systems aiming to analyze, recognize, and synthesize emotions using computational models. This talk will focus on speech emotion recognition (SER), although the observations considered in the presentation are also relevant to other speech tasks in affective computing. The intrinsic variability in which we express and perceive emotions makes SER a unique and challenging research problem that differs from other classical machine learning (ML) tasks. A key difference with other problems is that we do not have ground truth labels describing the felt emotion of the speaker of a target sentence. Therefore, the prevalent strategy involves relying on perceptual assessments collected from diverse evaluators, potentially leading to variance in their perceived emotional interpretations. Some may consider SER as a noisy ML problem given the inter-evaluator differences affecting the labels. However, our thesis is that valuable information is conveyed in the way we externalize and perceive emotions that can inform the design of better speech-based affective computing technology. This talk will describe principled observations rooted in the production and perception of emotions that have direct implications for the design of SER systems including (1) the ordinal nature of emotion, (2) the nonuniform externalization of emotions, (3) the specific modulation observed in speech for each emotional attribute, and (4) the intrinsic relation between speech and other modalities including facial expressions that can be leveraged even if the ultimate goal is to have speech-based systems.