Skip to main content

Noise Level Limited Sub-Modeling For Diffusion Probabilistic Vocoders

Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:10:05
09 Jun 2021

Although diffusion probabilistic vocoders WaveGrad and DiffWave can realize real-time high-fidelity speech synthesis with a simple loss function in training, all noise components with full noise level range are predicted by one model in all iterations. This paper proposes a simple but effective noise level limited sub-modeling framework for diffusion probabilistic vocoders as Sub-WaveGrad and Sub-DiffWave. In the proposed method, DiffWave conditioned on continuous noise level as WaveGrad and spectral enhancement post-filtering are also provided. The proposed Sub-WaveGrad and Sub-DiffWave models are realized by using 10 sub-models. These models are separately trained with different limited noise levels, and only necessary sub-models are used according to the noise schedule in inference. The results of experiments using a Japanese female speech corpus indicate that both the proposed Sub-WaveGrad and Sub-DiffWave outperform vanilla WaveGrad and DiffWave in terms of the model accuracy and synthesis quality while keeping the inference speed.

Chairs:
Jiangyan Yi

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $25.00
    Non-members: $40.00
  • SPS
    Members: Free
    IEEE Members: Free
    Non-members: Free