USING MODIFIED ADULT SPEECH AS DATA AUGMENTATION FOR CHILD SPEECH RECOGNITION
Zijian Fan (Norwegian University of Science and Technology); Xinwei Cao (NTNU); Giampiero Salvi (NTNU); Torbjørn Svendsen (NTNU)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Data augmentation is a technique which enhances the size and quality of training data such that deep learning or machine learning models can achieve better performance. This paper proposes a novel way of applying data augmentation for child speech recognition in the low data resource scenario. Data augmentation is achieved by modifying existing adult speech signals. The procedure consists of two main parts, resampling, and time scaling. The experiment involves both speech
from children aged from kindergarten to grade 10, and adults’ speech. We test the proposed method using both a TDNN-HMM and a GMM-HMM acoustic model. The results show that the proposed data augmentation scheme achieves a relative 7.95% reduction of WERs compared with 4.56% relative reduction when using a traditional bilinear frequency warping approach.