Aligntts: Efficient Feed-Forward Text-To-Speech System Without Explicit Alignment

Zhen Zeng, Jianzong Wang, Ning Cheng, Tian Xia, Jing Xiao

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 11:58

04 May 2020

Targeting at both high efficiency and performance, we propose AlignTTS to predict the mel-spectrum in parallel. AlignTTS is based on a Feed-Forward Transformer which generates mel-spectrum from a sequence of characters, and the duration of each character is determined by a duration predictor. Instead of adopting the attention mechanism in Transformer TTS to align text to mel-spectrum, the alignment loss is presented to consider all possible alignments in training by use of dynamic programming. Experiments on the LJSpeech dataset show that our model achieves not only state-of-the-art performance which outperforms Transformer TTS by 0.03 in mean option score (MOS), but also a high efficiency which is more than 50 times faster than real-time.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Aligntts: Efficient Feed-Forward Text-To-Speech System Without Explicit Alignment

Zhen Zeng, Jianzong Wang, Ning Cheng, Tian Xia, Jing Xiao

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join an IEEE Society