Correction Of Automatic Speech Recognition With Transformer Sequence-To-Sequence Model

Oleksii Hrinchuk, Mariya Popova, Boris Ginsburg

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 14:50

04 May 2020

In this work, we introduce a simple yet efficient post-processing model for automatic speech recognition. Our model has Transformer-based encoder-decoder architecture which "translates" acoustic model output into grammatically and semantically correct text. We investigate different strategies for regularizing and optimizing the model and show that extensive data augmentation and the initialization with pre-trained weights are required to achieve good performance. On the LibriSpeech benchmark, our method demonstrates significant improvement in word error rate over the baseline acoustic model with greedy decoding, especially on much noisier dev-other and test-other portions of the evaluation dataset. Our model also outperforms baseline with 6-gram language model re-scoring and approaches the performance of re-scoring with Transformer-XL neural language model.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Correction Of Automatic Speech Recognition With Transformer Sequence-To-Sequence Model

Oleksii Hrinchuk, Mariya Popova, Boris Ginsburg

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join an IEEE Society