Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 13:49
04 May 2020

Despite the significant progress of deep learning based speech separation methods, it remains challenging to extract and track the speech from target speakers, especially in a single-channel multiple speaker situation. Previously, the authors proposed a source-aware context network to exploit the temporal context in mixtures and estimated sources for online speech separation. In this paper, we propose a speaker-aware approach based on the source-aware context network structure, in which the speaker information is explicitly modeled by an auxiliary speaker identification branch. Then speech separation and speaker tracking can be jointly optimized by multi-task learning. Furthermore, we study the effectiveness of time-domain representation by proposing a raw sparse waveform encoder to preserve discriminative information. Experimental results on the WSJ0-2mix benchmark show that the proposed system significantly improves Signal-to-Distortion Ratio (SDR) performance.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00