Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 13:38
04 May 2020

Using deep neural network to extract speaker embedding has significantly improved the speaker verification task. However, such embeddings are still vulnerable to channel variability. Previous works have used adversarial training to suppress channel information to extract channel-invariant embedding and achieved a significant improvement. Inspired by the successful joint multi-task and adversarial training with phonetic information for phonetic-invariant speaker embedding learning, in this paper, a similar methodology is developed to suppress the channel variability. By treating the recording devices or environments as channel labels, two individual experiments are carried out, and consistent performance improvement is observed in both cases. The best performance is obtained by sequentially applying multi-task training at the statistics pooling layer and adversarial training at the embedding layer, achieving 10.77% and 9.37% relative improvements in terms of EER compared to the baselines, for the recording environments or devices level, respectively.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00