Speech Intelligibility Enhancement Using Non-Parallel Speaking Style Conversion with StarGAN and Dynamic Range Compression
Gang Li, Ruimin Hu, Shanfa Ke, Rui Zhang, Xiaochen Wang, Li Gao
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 08:39
Speech intelligibility enhancement is a perceptual enhancement technique for clean speech reproduced in noisy environments. It is typically used in the listening stage of multimedia communications. In this study, we enhance speech intelligibility by speaking style conversion (SSC), which is a data-driven approach inspired by a vocal mechanism named Lombard effect. The proposed SSC method combines star generative adversarial network (StarGAN) based mapping and dynamic range compression (DRC). It has two main advantages: 1) different from gender-independent conversion in previous studies, StarGAN can separately learn speech features of different genders to provide a differential conversion among genders with a single model and non-parallel training data; 2) we design a multi-level enhancement strategy with the use of DRC in the StarGAN architecture, which improves the SSC performance in strong noise interference. Experiments show that our method outperforms baseline methods.