CROSS-DOMAIN SPEECH ENHANCEMENT WITH A NEURAL CASCADE ARCHITECTURE
Heming Wang, DeLiang Wang
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:09:56
This paper proposes a novel cascade architecture to address the monaural speech enhancement problem. We leverage three different domains of speech representation, namely magnitude, waveform, and complex spectrogram, to progressively suppress the background noise within noisy speech. Our proposed neural cascade architecture consists of three modules, and each operates on the original noisy input and the output of the previous module in a distinct speech representation. During training, the network simultaneously optimizes all modules with a triple-domain loss. Experiments on the WSJ0 SI-84 corpus demonstrate that our proposed approach achieves superior enhancement results, and substantially outperforms previous baselines in terms of both speech quality and intelligibility.