Temporal Early Exiting for Streaming Speech Commands Recognition

Raphael Tang, Karun Kumar, Piyush Vyas, Gefei Yang, Yajie Mao, Craig Murray, Ji Xin, Jimmy Lin, Wenyan Li

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:07:21

11 May 2022

Limited-vocabulary speech commands recognition is the task of classifying a short utterance as one of several speech commands, for which neural networks obtain state-of-the-art results. In particular, recurrent neural networks represent a common approach for streaming commands recognition systems. In this paper, we explore resource-efficient methods to short-circuit such systems in the time domain when the model is confident in its prediction. We propose applying a frame-level labeling objective to further improve the efficiency-accuracy trade-off. On two datasets in limited-vocabulary commands recognition, our best method achieves an average time savings of 45% of the utterance without reducing the absolute accuracy by more than 0.6 points. We show that the per-instance savings depend on the length of the unique prefix in the phonemes across a dataset.

Tags:

speech commands

recurrent neural networks

early exiting

Temporal Early Exiting for Streaming Speech Commands Recognition

Raphael Tang, Karun Kumar, Piyush Vyas, Gefei Yang, Yajie Mao, Craig Murray, Ji Xin, Jimmy Lin, Wenyan Li

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

P4.14-Recurrent Neural Networks

PREDICTION OF DEEP ICE LAYER THICKNESS USING ADAPTIVE RECURRENT GRAPH NEURAL NETWORKS

DEEP SEQUENTIAL BEAMFORMER LEARNING FOR MULTIPATH CHANNELS IN MMWAVE COMMUNICATION SYSTEMS

Join an IEEE Society