SRU++: PIONEERING FAST RECURRENCE WITH ATTENTION FOR SPEECH RECOGNITION

Jing Pan, Tao Lei, Kwangyoun Kim, Kyu Han, Shinji Watanabe

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:13:24

12 May 2022

The Transformer architecture has been well adopted as a dominant architecture in most sequence transduction tasks including automatic speech recognition (ASR), since its attention mechanism excels in capturing long-range dependencies. While models built solely upon attention can be better parallelized than regular RNN, a novel network architecture, SRU++, was recently proposed. By combining the fast recurrence and attention mechanism, SRU++ exhibits strong capability in sequence modeling and achieves near-state-of-the-art results in various language modeling and machine translation tasks with improved compute efficiency. In this work, we present the advantages of applying SRU++ in ASR tasks by comparing with Conformer across multiple ASR benchmarks and study how the benefits can be generalized to long-form speech inputs. On the popular LibriSpeech benchmark, our SRU++ model achieves 2.0% / 4.7% WER on test-clean / test-other, showing competitive performances compared with the state-of-the-art Conformer encoder under the same set-up. Specifically, SRU++ can surpass Conformer on long-form speech input with a large margin, based on our analysis.

Tags:

speech recognition

sru

recurrent neural network

attention

SRU++: PIONEERING FAST RECURRENCE WITH ATTENTION FOR SPEECH RECOGNITION

Jing Pan, Tao Lei, Kwangyoun Kim, Kyu Han, Shinji Watanabe

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Tutorial: Foundational Problems in Neural Speech Recognition

Conversational Speech Processing and Recognition: Speech Separation, End-to-End Modeling, and Speaker Diarization

TRANSFORMING MULTIDIMENSIONAL DATA INTO IMAGES TO OVERCOME THE CURSE OF DIMENSIONALITY

Join an IEEE Society