Skip to main content

MULTI-STAGE AND MULTI-LOSS TRAINING FOR FULLBAND NON-PERSONALIZED AND PERSONALIZED SPEECH ENHANCEMENT

Lianwu Chen, Chenglin Xu, Xu Zhang, Xinlei Ren, Xiguang Zheng, Chen Zhang, Liang Guo, Bing Yu

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:16:13
07 May 2022

Deep learning-based wideband (16kHz) speech enhancement approaches have surpassed traditional methods. This work further extends the existing wideband systems to enable fullband (48kHz) speech enhancement while simultaneously ensuring automatic speech recognition compatibility and optionally, personalized speech enhancement. As shown in the evaluation results, this is achieved by employing a multi-stage and multi-loss training architecture that incorporates the recently proposed two-step structure, ASR loss produced by a back-end ASR encoder, and the speaker extraction network.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00