END-TO-END NEURAL SPEECH CODING FOR REAL-TIME COMMUNICATIONS

Xue Jiang, Chengyu Zheng, Yuan Zhang, Xiulian Peng, Huaying Xue, Yan Lu

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:12:10

12 May 2022

Deep-learning based methods have shown their advantages in audio coding over traditional ones but limited attention has been paid on real-time communications (RTC). This paper proposes the TFNet, an end-to-end neural speech codec with low latency for RTC. It takes an encoder-temporal filtering-decoder paradigm that has seldom been investigated in audio coding. An interleaved structure is proposed for temporal filtering to capture both short-term and long-term temporal dependencies. Furthermore, with end-to-end optimization, the TFNet is jointly optimized with speech enhancement and packet loss concealment, yielding a one-for-all network for three tasks. Both subjective and objective results demonstrate the efficiency of the proposed TFNet.

Tags:

real-time communications

neural audio coding

speech enhancement

packet loss concealment

END-TO-END NEURAL SPEECH CODING FOR REAL-TIME COMMUNICATIONS

Xue Jiang, Chengyu Zheng, Yuan Zhang, Xiulian Peng, Huaying Xue, Yan Lu

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Diffusion Models for Speech Enhancement and Restoration

Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features

Slides: Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features

Join an IEEE Society