CONTEXTUAL ADAPTERS FOR PERSONALIZED SPEECH RECOGNITION IN NEURAL TRANSDUCERS
Kanthashree Mysore Sathyendra, Thejaswi Muniyappa, Feng-Ju Chang, Jing Liu, Jinru Su, Grant P. Strimel, Athanasios Mouchtaris, Siegfried Kunzmann
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:09:48
Personal rare-word recognition in End-to-End Automatic Speech Recognition (E2E ASR) models is a challenge due to the lack of training data. A standard way to address this issue is with shallow fusion methods at inference time. However, due to their dependence on external language models and the deterministic approach to weight boosting, their performance is limited. In this paper, we propose training neural contextual adapters for personalization in neural transducer based ASR models. Our approach can not only bias towards user?s personal words, but also has the flexibility to work with pre-trained ASR models. Using an in-house dataset, we demonstrate that contextual adapters can be applied to any general purpose pre-trained ASR model to improve personal rare word recognition. Our method outperforms shallow fusion, while retaining functionality of the pre-trained models by not altering any of the model weights. We further show that the adapter style training is superior to full-finetuning of the ASR models on personalized data.