A TIME DOMAIN PROGRESSIVE LEARNING APPROACH WITH SNR CONSTRICTION FOR SINGLE-CHANNEL SPEECH ENHANCEMENT AND RECOGNITION
Zhaoxu Nian, Jun Du, Yu Ting Yeung, Renyu Wang
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:12:57
Single-channel speech enhancement for automatic speech recognition (ASR) has been widely studied. However, most speech enhancement methods conduct over suppression and introduce distortion, which limits performance gains or even deteriorates the back-end performance. The key to solving this problem is preserving the integrity of speech while suppressing the background noises. Therefore, we propose a time domain progressive learning (TDPL) approach for speech enhancement and ASR. TDPL model consists of encoder, progressive enhancer and decoder. Both SNR-increased intermediate target with less speech distortion and clean target with better listening quality/intelligibility are learned, which are provided for ASR pre-processing and speech communication, respectively. Additionally, we also present an SNR constriction loss that is fit for TDPL to further improve ASR performance. We evaluate the proposed methods on CHiME-4 real evaluation set. The results show that the TDPL method significantly outperforms time domain speech enhancement methods and frequency domain progressive learning methods in ASR task, and the intermediate output of TDPL achieves a 36.3% relative word error rate reduction with a powerful ASR back-end without retraining. Moreover, the estimated clean output achieves certain improvement on CHiME-4 simulation evaluation set in terms of PESQ and STOI measures.