MULTI-SAMPLE SUBBAND WAVERNN VIA MULTIVARIATE GAUSSIAN

Hiroki Kanagawa, Yusuke Ijima

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:14:46

13 May 2022

This paper proposes a high-speed neural vocoder for CPU implementation. Two approaches for speeding up autoregressive neural vocoders have been proposed, 1) simultaneous multiple sample generation and 2) subband signal-based vocoder; so far they have been employed independently. Our neural vocoder is extremely fast as it generates multiple samples of subband signals simultaneously. Although there is an association between each subband signal, the conventional subband-based vocoder can degrade quality because each subband signal is generated from an independent probability distribution. To overcome this problem, we also introduce waveform generation that takes account of the association of each subband by employing multivariate Gaussian. Experiments show that 1) our proposed method is 1.81 times as fast as the conventional subband WaveRNN on a single-threaded CPU; 2) it outperformed the conventional method in a subjective evaluation in terms of naturalness, and achieved a mean opinion score (MOS) of 4.08 on text-to-speech task.

Tags:

subband signal

speech synthesis

multivariate gaussian

neural vocoder

multi-sample generation

MULTI-SAMPLE SUBBAND WAVERNN VIA MULTIVARIATE GAUSSIAN

Hiroki Kanagawa, Yusuke Ijima

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning

Slides for: An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning

GENERALIZATION ABILITY OF MOS PREDICTION NETWORKS

Join an IEEE Society