A NEURAL PROSODY ENCODER FOR END-TO-END DIALOGUE ACT CLASSIFICATION

Kai Wei, Martin Radfar, Thanh Tran, Markus Mueller, Grant Strimel, Nathan Susanj, Athanasios Mouchtaris, Maurizio Omologo, Dillon Knox

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:10:13

10 May 2022

Dialogue act classification (DAC) is a critical task for spoken language understanding in dialogue systems. Prosodic features such as pitch have been shown to be useful for DAC. Despite their importance,little research has explored neural approaches to integrate prosodic features into end-to-end (E2E) DAC models which infer dialogue acts directly from audio signals. In this work, we propose an E2E neural architecture that takes into account this need of characterizing prosodic phenomena co-occurring at different levels inside an utterance. A novel part of this architecture is a learnable gating mechanism that assesses the importance of prosodic features and selectively retains core information necessary for E2E DAC. Our proposed model improves the dialogue act accuracy by 1.07% absolute across three publicly available benchmark datasets.

Tags:

end-to-end neural models

prosody

neural gating

dialogue act modeling

A NEURAL PROSODY ENCODER FOR END-TO-END DIALOGUE ACT CLASSIFICATION

Kai Wei, Martin Radfar, Thanh Tran, Markus Mueller, Grant Strimel, Nathan Susanj, Athanasios Mouchtaris, Maurizio Omologo, Dillon Knox

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

INTERACTIVE MULTI-LEVEL PROSODY CONTROL FOR EXPRESSIVE SPEECH SYNTHESIS

DEEP LEARNING FOR PROMINENCE DETECTION IN CHILDREN'S READ SPEECH

PROSODYSPEECH: TOWARDS ADVANCED PROSODY MODEL FOR NEURAL TEXT-TO-SPEECH

Join an IEEE Society

A NEURAL PROSODY ENCODER FOR END-TO-END DIALOGUE ACT CLASSIFICATION

Kai Wei, Martin Radfar, Thanh Tran, Markus Mueller, Grant Strimel, Nathan Susanj, Athanasios Mouchtaris, Maurizio Omologo, Dillon Knox

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

INTERACTIVE MULTI-LEVEL PROSODY CONTROL FOR EXPRESSIVE SPEECH SYNTHESIS

DEEP LEARNING FOR PROMINENCE DETECTION IN CHILDREN&#039;S READ SPEECH

PROSODYSPEECH: TOWARDS ADVANCED PROSODY MODEL FOR NEURAL TEXT-TO-SPEECH

Join an IEEE Society

DEEP LEARNING FOR PROMINENCE DETECTION IN CHILDREN'S READ SPEECH