Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 12:47
04 May 2020

Accented speech poses significant challenges for state-of-the-art automatic speech recognition (ASR) systems. Accent is a property of speech that lasts throughout an utterance in varying degrees of strength. This makes it hard to isolate the influence of accent on individual speech sounds. We propose coupled training for encoder-decoder ASR models that acts on pairs of utterances corresponding to the same text spoken by speakers with different accents. This training regime introduces an L2 loss between the attention-weighted representations corresponding to pairs of utterances with the same text, thus acting as a regularizer and encouraging representations from the encoder to be more accent-invariant. We focus on recognizing accented English samples from the Mozilla Common Voice corpus. We obtain significant error rate reductions on accented samples from a large set of diverse accents using coupled training. We also show consistent improvements in performance on heavily accented samples (as determined by a standalone accent classifier).

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00