Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:09:00
11 May 2022

Language Model (LM) which is commonly trained on a large corpora has been proven the robustness and effectiveness for tasks of Natural Language Understanding (NLU) in many applications such as virtual assistant or recommendation system. These applications normally receive outputs of the automatic speech recognition (ASR) module as spoken form inputs which generally lack both lexical and syntactic information. Pre-trained language models, for example, BERT (Devlin et al., 2019) or XLM-RoBERTa (Ruder et al., 2019), which are often pre-trained on written form corpora perform decreased performance on NLU tasks with spoken form inputs. In this paper, we propose a novel model to train a language model namely CapuBERT that is able to deal with spoken form input from the ASR module. The experimental results show that the proposed model achieves state-of-the-art results on several NLU tasks included Part-of-speech tagging, Named-entity recognition and Chunking in English, German, and Vietnamese languages.

More Like This

  • CIS
    Members: Free
    IEEE Members: Free
    Non-members: Free
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00