Acoustic Modeling For Multi-Array Conversational Speech Recognition In The Chime-6 Challenge
Li Chai, Jun Du, Di-Yuan Liu, Yan-Hui Tu, Chin-Hui Lee
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 0:14:43
This paper presents our main contributions of acoustic modeling for multi-array multi-talker speech recognition in the CHiME-6 Challenge, exploring different strategies for acoustic data augmentation and neural network architectures. First, enhanced data from our front-end network preprocessing and spectral augmentation are investigated to be effective for improving speech recognition performance. Second, several neural network architectures are explored by different combinations of deep residual network (ResNet), factorized time delay neural network (TDNNF) and residual bidirectional long short-term memory (RBiLSTM). Finally, multiple acoustic models can be combined via minimum Bayes risk fusion. Compared with the official baseline acoustic model, the proposed solution can achieve a relatively word error rate reduction of 19% for the best single ASR system on the evaluation data, which is also one of main contributions to our top system for the Track 1 tasks of the CHiME-6 Challenge.