Skip to main content

Temporal Early Exiting for Streaming Speech Commands Recognition

Raphael Tang, Karun Kumar, Piyush Vyas, Gefei Yang, Yajie Mao, Craig Murray, Ji Xin, Jimmy Lin, Wenyan Li

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:07:21
11 May 2022

Limited-vocabulary speech commands recognition is the task of classifying a short utterance as one of several speech commands, for which neural networks obtain state-of-the-art results. In particular, recurrent neural networks represent a common approach for streaming commands recognition systems. In this paper, we explore resource-efficient methods to short-circuit such systems in the time domain when the model is confident in its prediction. We propose applying a frame-level labeling objective to further improve the efficiency-accuracy trade-off. On two datasets in limited-vocabulary commands recognition, our best method achieves an average time savings of 45% of the utterance without reducing the absolute accuracy by more than 0.6 points. We show that the per-instance savings depend on the length of the unique prefix in the phonemes across a dataset.

More Like This

01 Feb 2024

P4.14-Recurrent Neural Networks

1.00 pdh 0.10 ceu
  • SPS
    Members: Free
    IEEE Members: Free
    Non-members: Free
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00