Emotional Voice Conversion Using Multitask Learning With Text-To-Speech

Tae-Ho Kim, Shinkook Choi, Sejik Park, Soo-Young Lee, Sungjae Cho

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 12:53

04 May 2020

Voice conversion (VC) is a task that alters the voice of a person to suit different styles while conserving the linguistic content. Previous state-of-the-art technology used in VC was based on the sequence-to-sequence (seq2seq) model, which could lose linguistic information. There was an attempt to overcome this problem using textual supervision; however, this required explicit alignment, and therefore the benefit of using seq2seq model was lost. In this study, a voice converter that utilizes multitask learning with text-to-speech (TTS) is presented. By using multitask learning, VC is expected to capture linguistic information and preserve the training stability. This method does not require explicit alignment for capturing abundant text information. Experiments on VC were performed on a male-Korean-emotional-text-speech dataset to convert the neutral voice to emotional voice. It was shown that multitask learning helps to preserve the linguistic contents.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Emotional Voice Conversion Using Multitask Learning With Text-To-Speech

Tae-Ho Kim, Shinkook Choi, Sejik Park, Soo-Young Lee, Sungjae Cho

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join an IEEE Society