Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 12:51
04 May 2020

End-to-end approaches for automatic speech recognition benefit from modeling the probability of the word sequence given the input audio stream directly in a single neural network. However, compared to conventional ASR systems, these models typically require more data to achieve results comparable results. In addition, conventional systems have already been optimized for various production environments and use cases. In this work, we propose to combine the benefits of end to end approaches with a conventional system using an attention-based \emph{discriminative language model} that learns to re-score the output of a first-pass ASR system. We show that learning to re-rank a list of potential ASR outputs is much simpler than learning to generate the hypothesis and our model can get upto 8\% improvement in word error rate even when the amount of training data is a fraction of training data used for training the first-pass system.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00