Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 07:49
04 May 2020

Recent work has demonstrated the promise of using subword units as output targets for sequence-to-sequence automatic speech recognition (ASR) models. Our work builds on the latent sequence decomposition (LSD) framework, in which the use of subword units for ASR is dependent on both the speech input and text output. In this paper, we follow the LSD method for using subword units but introduce an updated loss function that allows the ASR model to explicitly perform unit discovery, as well. We show that our n-gram loss function outperforms standard maximum likelihood loss within the LSD framework. We also show that uniform greedy sampling of subword units, which is much faster than LSD, is also an effective decomposition strategy when combined with the n-gram loss. Along with quantitative results on the Wall Street Journal Corpus, we present an analysis of the subword inventory learned by our model.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00