Audio-Attention Discriminative Language Model For Asr Rescoring

Ankur Gandhe, Ariya Rastrow

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 12:51

04 May 2020

End-to-end approaches for automatic speech recognition benefit from modeling the probability of the word sequence given the input audio stream directly in a single neural network. However, compared to conventional ASR systems, these models typically require more data to achieve results comparable results. In addition, conventional systems have already been optimized for various production environments and use cases. In this work, we propose to combine the benefits of end to end approaches with a conventional system using an attention-based \emph{discriminative language model} that learns to re-score the output of a first-pass ASR system. We show that learning to re-rank a list of potential ASR outputs is much simpler than learning to generate the hypothesis and our model can get upto 8\% improvement in word error rate even when the amount of training data is a fraction of training data used for training the first-pass system.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Audio-Attention Discriminative Language Model For Asr Rescoring

Ankur Gandhe, Ariya Rastrow

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join an IEEE Society