Learning interpretable filters in Wav-Unet for speech enhancement

Félix MATHIEU (Telecom Paris); Thomas Courtat (Thales); Gaël Richard (Telecom Paris, Institut polytechnique de Paris); Geoffroy Peeters (LTCI - Télécom Paris, IP Paris)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

08 Jun 2023

Due to their performances, deep neural networks have emerged as a major method in nearly all modern audio processing applications. Deep neural networks can be used to estimate some parameters or hyperparameters of a model, or in some cases the entire model in an end-to-end fashion. Although Deep learning can lead to state of the art performances, they also suffer from inherent weaknesses as they usually remain complex and non interpretable to a large extent. For instance, the internal filters used in each layers are chosen in an adhoc manner with only a loose relation with the nature of the processed signal. We propose in this paper an approach to learn interpretable filters within a specific neural architecture which allow to better understand the behaviour of the neural network and to reduce its complexity. We validate the approach on a task of speech enhancement and show that the gain in interpretability does not degrade the performance of the model.

Tags:

Speech and singing voice synthesis/convertion/coding

Learning interpretable filters in Wav-Unet for speech enhancement

Félix MATHIEU (Telecom Paris); Thomas Courtat (Thales); Gaël Richard (Telecom Paris, Institut polytechnique de Paris); Geoffroy Peeters (LTCI - Télécom Paris, IP Paris)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

DiffVoice: Text-to-Speech with Latent Diffusion

PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor

DELIVERING SPEAKING STYLE IN LOW-RESOURCE VOICE CONVERSION WITH MULTI-FACTOR CONSTRAINTS

Join an IEEE Society