Noise-Disentanglement Metric Learning for Robust Speaker Verification

Yao Sun (Tianjin University); Hanyi zhang (tianjin university); Longbiao Wang (Tianjin University); Kong Aik Lee (Institute for Infocomm Research, ASTAR); Meng Liu (Tianjin University); Jianwu Dang (Tianjin University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Automatic speaker verification (ASV) suffers from performance degradation in noisy environments. To solve this problem, we propose the noise-disentanglement metric learning to reduce the speaker-irrelevant noisy components and build a noise-invariant embedding space. Specifically, the disentanglement module, including the speaker encoder and reconstruction module, is dedicated to decoupling speech signals. The speaker encoder is used to disentangle speaker-related components, and the reconstruction module increases the model's ability to constrain the noise information by reconstructing the signal. In addition, distribution optimization is introduced to supervise the spatial structure of speaker embeddings under noisy environments. Experiments on VoxCeleb1 indicate that the proposed method improves the performance of the speaker verification system in both clean and noisy conditions.

Tags:

Speaker recognition/identification/diarization

Noise-Disentanglement Metric Learning for Robust Speaker Verification

Yao Sun (Tianjin University); Hanyi zhang (tianjin university); Longbiao Wang (Tianjin University); Kong Aik Lee (Institute for Infocomm Research, ASTAR); Meng Liu (Tianjin University); Jianwu Dang (Tianjin University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Moving Towards Non-Binary Gender Identification Via Analysis of System Errors in Binary Gender Classification

INCORPORATING UNCERTAINTY FROM SPEAKER EMBEDDING ESTIMATION TO SPEAKER VERIFICATION

Jeffreys divergence-based regularization of neural network output distribution applied to speaker recognition

Join an IEEE Society