Rnn-Transducer With Stateless Prediction Network

Mohammadreza Ghodsi, Xiaofeng Liu, James Apfel, Rodrigo Cabrera, Eugene Weinstein

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 13:49

04 May 2020

The RNN-Transducer (RNNT) outperforms classic Automatic Speech Recognition (ASR) systems when a large amount of supervised training data is available. For low-resource languages, the RNNT models overfit, and can not directly take advantage of additional large text corpora as in classic ASR systems. We focus on the prediction network of the RNNT, since it is believed to be analogous to the Language Model (LM) in the classic ASR systems. We pre-train the prediction network with text-only data, which is not helpful. Moreover, removing the recurrent layers from the prediction network, which makes the prediction network stateless, performs virtually as well as the original RNNT model, when using wordpieces. The stateless prediction network does not depend on the previous output symbols, except the last one. Therefore it simplifies the RNNT architectures and the inference. Our results suggest that the RNNT prediction network does not function as the LM in classical ASR. Instead, it merely helps the model align to the input audio, while the RNNT encoder and joint networks capture both the acoustic and the linguistic information.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Rnn-Transducer With Stateless Prediction Network

Mohammadreza Ghodsi, Xiaofeng Liu, James Apfel, Rodrigo Cabrera, Eugene Weinstein

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join an IEEE Society