A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer And Large Scale Synthetic Data

Nathan Howard, Alex Park, Turaj Shabestary, Alexander Gruenstein, Rohit Prabhavalkar

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:09:32

11 Jun 2021

We consider the problem of recognizing speech utterances spoken to a device which is generating a known sound waveform; for example, recognizing queries issued to a digital assistant which is generating responses to previous user inputs. Previous work has proposed building acoustic echo cancellation (AEC) models for this task that optimize speech enhancement metrics using both neural network as well as signal processing approaches. Since our goal is to recognize the input speech, we consider enhancements which improve word error rates (WERs) when the predicted speech signal is passed to an automatic speech recognition (ASR) model. First, we augment the loss function with a term that produces outputs useful to a pre-trained ASR model and show that this augmented loss function improves WER metrics. Second, we demonstrate that augmenting our training dataset of real world examples with a large synthetic dataset improves performance. Crucially, applying SpecAugment style masks to the reference channel during training aids the model in adapting from synthetic to real domains. In experimental evaluations, we find the proposed approaches improve performance, on average, by 57 % over a signal processing baseline and 45 % over the neural AEC model without the proposed changes.

Chairs:

Ann Spriet

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021

A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer And Large Scale Synthetic Data

Nathan Howard, Alex Park, Turaj Shabestary, Alexander Gruenstein, Rohit Prabhavalkar

Value-Added Bundle(s) Including this Product

ICASSP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Invertible Ac-Flow: Direct Attenuation Correction Of Pet Images Without Ct Or Mr Images

Purecomb: Poisson Unbiased Risk Estimator Based Ensemble Of Self-Supervised Deep Denoisers For Clinical Bone Scan Image

Single-Cell Tracking With Compton Pet: An In Silico Feasibility Study

Join an IEEE Society