Learning A Subword Inventory Jointly With End-To-End Automatic Speech Recogntion
Jennifer Drexler, James Glass
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 07:49
Recent work has demonstrated the promise of using subword units as output targets for sequence-to-sequence automatic speech recognition (ASR) models. Our work builds on the latent sequence decomposition (LSD) framework, in which the use of subword units for ASR is dependent on both the speech input and text output. In this paper, we follow the LSD method for using subword units but introduce an updated loss function that allows the ASR model to explicitly perform unit discovery, as well. We show that our n-gram loss function outperforms standard maximum likelihood loss within the LSD framework. We also show that uniform greedy sampling of subword units, which is much faster than LSD, is also an effective decomposition strategy when combined with the n-gram loss. Along with quantitative results on the Wall Street Journal Corpus, we present an analysis of the subword inventory learned by our model.