Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:13:11
10 May 2022

Non-parallel voice conversion (VC) is a technique of transferring voice from one style to another without using a parallel corpus in model training. Various methods are proposed to approach non-parallel VC using deep neural networks. Among them, CycleGAN-VC and its variants have been widely accepted as benchmark methods. However, there is still a gap to bridge between the real target and converted voice and an increased number of parameters leads to slow convergence in training process. Inspired by recent advancements in unsupervised image translation, we propose a new end-to-end unsupervised framework U-GAT-VC that adopts a novel inter- and intra-attention mechanism to guide the voice conversion to focus on more important regions in spectrograms. We also introduce disentangle perceptual loss in our model to capture high-level spectral features. Subjective and objective evaluations shows our proposed model outperforms CycleGAN-VC2/3 in terms of conversion quality and voice naturalness.

More Like This

  • CIS
    Members: Free
    IEEE Members: Free
    Non-members: Free
  • CIS
    Members: Free
    IEEE Members: Free
    Non-members: Free
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00