Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:05:44
09 Jun 2021

In the automatic speech recognition (ASR) system, how to solve the problem of code-switch speech recognition has been a concern. Code-switch speech recognition is challenging due to data scarcity as well as diverse syntactic structures across languages. In this paper, we focus on the code-switch speech recognition in mainland China, which is obviously different from the Hong Kong and Southeast Asia area in linguistic characteristics. We propose a novel approach that only uses monolingual data for code-switch second-pass speech recognition which is also named language model rescoring. The approach converts the code-switch sentence to a monolingual sentence by a word mapping and language model determination step, therefore the issue of data scarcity is unnecessary to be considered. The word pairs during the word mapping step are generated by a fine-designed generation process that incorporates machine translation, word alignment, etc. We show that the proposed approach achieves an over 7.23% relative WER reduction from the monolingual language model (MLM) rescoring in our test set.

Chairs:
Karen Livescu

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $25.00
    Non-members: $40.00
  • SPS
    Members: Free
    IEEE Members: Free
    Non-members: Free