CROSS-DOMAIN SPEECH ENHANCEMENT WITH A NEURAL CASCADE ARCHITECTURE

Heming Wang, DeLiang Wang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:09:56

12 May 2022

This paper proposes a novel cascade architecture to address the monaural speech enhancement problem. We leverage three different domains of speech representation, namely magnitude, waveform, and complex spectrogram, to progressively suppress the background noise within noisy speech. Our proposed neural cascade architecture consists of three modules, and each operates on the original noisy input and the output of the previous module in a distinct speech representation. During training, the network simultaneously optimizes all modules with a triple-domain loss. Experiments on the WSJ0 SI-84 corpus demonstrate that our proposed approach achieves superior enhancement results, and substantially outperforms previous baselines in terms of both speech quality and intelligibility.

Tags:

complex domain

magnitude

cross-domain speech enhancement

speech enhancement

time domain