Incorporating Written Domain Numeric Grammars Into End-To-End Contextual Speech Recognition Systems For Improved Recognition Of Numeric Sequences
Ben Haynor, Petar Aleksic
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 12:10
Accurate recognition of numeric sequences is crucial for many contextual speech recognition applications. For example, a user might create a calendar event and be prompted by a virtual assistant for the time, date, and duration of the event. We propose a modular and scalable solution for improved recognition of numeric sequences. We use finite state transducers built from written domain numeric grammars to increase the likelihood of hypotheses containing matching numeric entities during beam search in an end-to-end speech recognition system. Using our technique results in relative reduction in word error rate of up to 59\% on a variety of numeric sequence recognition tasks (times, percentages, digit sequences, ...).