MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR MULTI-CHANNEL SPEECH ENHANCEMENT

Guochang Zhang, Chunliang Wang, Libiao Yu, Jianqiang Wei

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:10:49

07 May 2022

Speech quality is often degraded by background noise and reverberation. Usually, a dense prediction network is used to reconstruct clean speech. In this work, a novel backbone for speech dense-prediction is proposed. After adjusting part of the input and output, this backbone is used for multi-channel speech enhancement task in this paper. To improve the performance of the backbone, strategies such as multi-channel phase encoder, multi-scale temporal frequency processing, axial self-attention, and two-stage masking are designed. Our proposed method is evaluated based on the datasets of ICASSP 2022 L3DAS22 Challenge. The experimental results show that the proposed method outperforms previous state-of-the-art baselines by a large margin and ranked second in L3DAS22 Challenge.The proposed backbone is also used for mono-channel speech enhancement and ranked first in both ICASSP 2022 AEC and DNS Challenges(non-personal track).

Tags:

axial self-attention

speech enhancement

speech dense-prediction

multi-scale

MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR MULTI-CHANNEL SPEECH ENHANCEMENT

Guochang Zhang, Chunliang Wang, Libiao Yu, Jianqiang Wei

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Diffusion Models for Speech Enhancement and Restoration

Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features

Slides: Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features

Join an IEEE Society