Speech Intelligibility Enhancement Using Non-Parallel Speaking Style Conversion with StarGAN and Dynamic Range Compression

Gang Li, Ruimin Hu, Shanfa Ke, Rui Zhang, Xiaochen Wang, Li Gao

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 08:39

09 Jul 2020

Speech intelligibility enhancement is a perceptual enhancement technique for clean speech reproduced in noisy environments. It is typically used in the listening stage of multimedia communications. In this study, we enhance speech intelligibility by speaking style conversion (SSC), which is a data-driven approach inspired by a vocal mechanism named Lombard effect. The proposed SSC method combines star generative adversarial network (StarGAN) based mapping and dynamic range compression (DRC). It has two main advantages: 1) different from gender-independent conversion in previous studies, StarGAN can separately learn speech features of different genders to provide a differential conversion among genders with a single model and non-parallel training data; 2) we design a multi-level enhancement strategy with the use of DRC in the StarGAN architecture, which improves the SSC performance in strong noise interference. Experiments show that our method outperforms baseline methods.

Tags:

icme 2020

sps conference

Speech Intelligibility Enhancement Using Non-Parallel Speaking Style Conversion with StarGAN and Dynamic Range Compression

Gang Li, Ruimin Hu, Shanfa Ke, Rui Zhang, Xiaochen Wang, Li Gao

Value-Added Bundle(s) Including this Product

ICME 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join an IEEE Society