Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

We adopt the recent cycle consistent generative adversarial network (MaskCycleGAN-VC) that allows converting a specific singing technique using only a few articulations of singing voice as examples. Since it is often prone to fail to preserve the content information of the singing voice due to distortion and noise during the conversion, a self-supervised learning module is proposed as the basic framework to enforce content consistency without additional annotations. We evaluate the proposed methods on three datasets that were commonly used in pop songs which involve singing techniques in terms of breathy voice, vibrato, and vocal fry. Experiments showed that our proposed methods outperform the baseline in terms of audio quality and content preservation, including melody and singer's timbral identity, without affecting the perception of singing techniques. \footnote{samples can be found in https://powei-c.github.io/STC/}

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00