MULTI-STAGE AND MULTI-LOSS TRAINING FOR FULLBAND NON-PERSONALIZED AND PERSONALIZED SPEECH ENHANCEMENT
Lianwu Chen, Chenglin Xu, Xu Zhang, Xinlei Ren, Xiguang Zheng, Chen Zhang, Liang Guo, Bing Yu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:16:13
Deep learning-based wideband (16kHz) speech enhancement approaches have surpassed traditional methods. This work further extends the existing wideband systems to enable fullband (48kHz) speech enhancement while simultaneously ensuring automatic speech recognition compatibility and optionally, personalized speech enhancement. As shown in the evaluation results, this is achieved by employing a multi-stage and multi-loss training architecture that incorporates the recently proposed two-step structure, ASR loss produced by a back-end ASR encoder, and the speaker extraction network.