USING MODIFIED ADULT SPEECH AS DATA AUGMENTATION FOR CHILD SPEECH RECOGNITION

Zijian Fan (Norwegian University of Science and Technology); Xinwei Cao (NTNU); Giampiero Salvi (NTNU); Torbjørn Svendsen (NTNU)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Data augmentation is a technique which enhances the size and quality of training data such that deep learning or machine learning models can achieve better performance. This paper proposes a novel way of applying data augmentation for child speech recognition in the low data resource scenario. Data augmentation is achieved by modifying existing adult speech signals. The procedure consists of two main parts, resampling, and time scaling. The experiment involves both speech from children aged from kindergarten to grade 10, and adults’ speech. We test the proposed method using both a TDNN-HMM and a GMM-HMM acoustic model. The results show that the proposed data augmentation scheme achieves a relative 7.95% reduction of WERs compared with 4.56% relative reduction when using a traditional bilinear frequency warping approach.

Tags:

New algorithms and approaches for speech recognition

USING MODIFIED ADULT SPEECH AS DATA AUGMENTATION FOR CHILD SPEECH RECOGNITION

Zijian Fan (Norwegian University of Science and Technology); Xinwei Cao (NTNU); Giampiero Salvi (NTNU); Torbjørn Svendsen (NTNU)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

Noise-aware target extension with self-distillation for robust speech recognition

PRACTICE OF THE CONFORMER ENHANCED AUDIO-VISUAL HUBERT ON MANDARIN AND ENGLISH

Join an IEEE Society