Acoustic Modeling For Multi-Array Conversational Speech Recognition In The Chime-6 Challenge

Li Chai, Jun Du, Di-Yuan Liu, Yan-Hui Tu, Chin-Hui Lee

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 0:14:43

19 Jan 2021

This paper presents our main contributions of acoustic modeling for multi-array multi-talker speech recognition in the CHiME-6 Challenge, exploring different strategies for acoustic data augmentation and neural network architectures. First, enhanced data from our front-end network preprocessing and spectral augmentation are investigated to be effective for improving speech recognition performance. Second, several neural network architectures are explored by different combinations of deep residual network (ResNet), factorized time delay neural network (TDNNF) and residual bidirectional long short-term memory (RBiLSTM). Finally, multiple acoustic models can be combined via minimum Bayes risk fusion. Compared with the official baseline acoustic model, the proposed solution can achieve a relatively word error rate reduction of 19% for the best single ASR system on the evaluation data, which is also one of main contributions to our top system for the Track 1 tasks of the CHiME-6 Challenge.

Tags:

sps conference

slt 2021

Acoustic Modeling For Multi-Array Conversational Speech Recognition In The Chime-6 Challenge

Li Chai, Jun Du, Di-Yuan Liu, Yan-Hui Tu, Chin-Hui Lee

Value-Added Bundle(s) Including this Product

SLT 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join an IEEE Society