A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoder

Yang Xiang, Jesper Lisby H?jvang, Morten H?jfeldt Rasmussen, Mads Gr?sb?ll Christensen

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:15:54

09 May 2022

Recently, variational autoencoder (VAE), a deep representation learning (DRL) model, has been used to perform speech enhancement (SE). However, to the best of our knowledge, current VAE-based SE methods only apply VAE to model speech signal, while noise is modeled using the traditional non-negative matrix factorization (NMF) model. One of the most important reasons for using NMF is that these VAE-based methods cannot disentangle the speech and noise latent variables from the observed signal. Based on Bayesian theory, this paper derives a novel variational lower bound for VAE, which ensures that VAE can be trained in supervision, and can disentangle speech and noise latent variables from the observed signal. This means that the proposed method can apply the VAE to model both speech and noise signals, which is totally different from the previous VAE-based SE works. More specifically, the proposed DRL method can learn to impose speech and noise signal priors to different sets of latent variables for SE. The experimental results show that the proposed method can not only disentangle speech and noise latent variables from the observed signal, but also obtain a higher scale-invariant signal-to-distortion ratio and speech quality score than similar deep neural network-based (DNN) SE method.

Tags:

deep representation learning

variational autoencoder

speech enhancement

bayesian permutation training

A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoder

Yang Xiang, Jesper Lisby H?jvang, Morten H?jfeldt Rasmussen, Mads Gr?sb?ll Christensen

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Short Course Bundle: ICASSP 2022 COURSE 5: Speech Technology for Health: From Technical Foundations to Applications (Parts 1-3)

Audio Signal Enhancement: A Weakly Supervised Deep Learning Approach

Diffusion Models for Speech Enhancement and Restoration

Join an IEEE Society