Skip to main content

NSV-TTS: NON-SPEECH VOCALIZATION MODELING AND TRANSFER IN EMOTIONAL TEXT-TO-SPEECH

Haitong Zhang (Netease Games AI Lab); Xinyuan Yu (Netease Games AI Lab); Yue Lin (NetEase Games AI Lab)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
07 Jun 2023

This paper addresses the problem of non-speech vocalization (NSV) modeling and transfer in emotional TTS. We propose an emotion TTS system (NSV-TTS) to model NSV and emotional speech. The model utilizes self-supervised learning to extract unsupervised linguistic units (ULUs) for NSV labeling and zero-shot NSV transfer. Furthermore, we propose token mixing and random masking to boost the performance. We evaluate the proposed method on various NSV types and emotion classes. The experimental results reveal that the proposed method performs well in the zero-shot NSV transfer task. Lastly, we conduct ablation studies to investigate the proposed method further.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00