Cascaded Time + Time-Frequency Unet For Speech Enhancement: Jointly Addressing Clipping, Codec Distortions, And Gaps
Arun Asokan Nair, Kazuhito Koishida
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:15:31
Speech enhancement aims to improve speech quality by eliminating noise and distortions. While most speech enhancement methods address signal independent additive sources of noise, several degradations to speech signals are signal dependent and non-additive, like speech clipping, codec distortions, and gaps in speech. In this work, we first systematically study and achieve state of the art results on each of these three distortions individually. Next, we demonstrate a neural network pipeline that cascades a time domain convolutional neural network with a time-frequency domain convolutional neural network to address all three distortions jointly. We observe that such a cascade achieves good performance while having the added benefit of keeping the action of each neural network component interpretable.
Chairs:
Ann Spriet