Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 0:14:43
19 Jan 2021

This paper presents our main contributions of acoustic modeling for multi-array multi-talker speech recognition in the CHiME-6 Challenge, exploring different strategies for acoustic data augmentation and neural network architectures. First, enhanced data from our front-end network preprocessing and spectral augmentation are investigated to be effective for improving speech recognition performance. Second, several neural network architectures are explored by different combinations of deep residual network (ResNet), factorized time delay neural network (TDNNF) and residual bidirectional long short-term memory (RBiLSTM). Finally, multiple acoustic models can be combined via minimum Bayes risk fusion. Compared with the official baseline acoustic model, the proposed solution can achieve a relatively word error rate reduction of 19% for the best single ASR system on the evaluation data, which is also one of main contributions to our top system for the Track 1 tasks of the CHiME-6 Challenge.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00