PUNCTUATION PREDICTION FOR STREAMING ON-DEVICE SPEECH RECOGNITION

Zhikai Zhou, Yanmin Qian, Tian Tan

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:08:28

10 May 2022

Punctuation prediction is essential for automatic speech recognition (ASR). Although many works have been proposed for punctuation prediction, the on-device scenarios are rarely discussed with an end-to-end ASR. The punctuation prediction task is often treated as a post-processing of ASR outputs, but the mismatch between natural language in training input and ASR hypotheses in testing is ignored. Besides, language models built with deep neural networks are too large for edge devices. In this paper, we discuss one-pass models for both ASR and punctuation prediction to replace the conventional two-pass post-processing pipeline. Then the joint ASR-punctuation model is proposed to utilize multi-task learning to decouple the recognition and punctuation on the ASR decoder. Experimental results show that the proposed joint model not only outperforms the traditional post-processing method with limited extra parameters, but also achieves better accuracy in comparison to the direct ASR modeling on transcripts with punctuation.

Tags:

streaming speech recognition

multi-task

edge devices

punctuation prediction

PUNCTUATION PREDICTION FOR STREAMING ON-DEVICE SPEECH RECOGNITION

Zhikai Zhou, Yanmin Qian, Tian Tan

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

FORWARD DIFFUSION GUIDED RECONSTRUCTION AS A MULTI-MODAL MULTI-TASK LEARNING SCHEME

IMPROVEMENT OF IMAGE SEGMENTATION MODEL FOR HANDWRITTEN NOTEBOOK ANALYTICS

Photoplethysmography (PPG) sensors that Need Acclerators to Implement in Edge Devices

Join an IEEE Society