Frame-wise and overlap-robust speaker embeddings for meeting diarization

Tobias Cord-Landwehr (Paderborn University); Christoph B Boeddeker (Paderborn University); Catalin Zorila (Toshiba Cambridge Research Laboratory); Rama S Doddipatla (Toshiba Europe LTD); Reinhold Haeb-Umbach (University of Paderborn)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Using a Teacher-Student training approach we developed a speaker embedding extraction system that outputs embeddings at frame rate. Given this high temporal resolution and the fact that the student produces sensible speaker embeddings even for segments with speech overlap, the frame-wise embeddings serve as an appropriate representation of the input speech signal for an end-to-end neural meeting diarization (EEND) system. We show in experiments that this representation helps mitigate a well-known problem of EEND systems: when increasing the number of speakers the diarization performance drop is significantly reduced. We also introduce block-wise processing to be able to diarize arbitrarily long meetings.

Tags:

Speaker recognition/identification/diarization

Frame-wise and overlap-robust speaker embeddings for meeting diarization

Tobias Cord-Landwehr (Paderborn University); Christoph B Boeddeker (Paderborn University); Catalin Zorila (Toshiba Cambridge Research Laboratory); Rama S Doddipatla (Toshiba Europe LTD); Reinhold Haeb-Umbach (University of Paderborn)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Moving Towards Non-Binary Gender Identification Via Analysis of System Errors in Binary Gender Classification

INCORPORATING UNCERTAINTY FROM SPEAKER EMBEDDING ESTIMATION TO SPEAKER VERIFICATION

Jeffreys divergence-based regularization of neural network output distribution applied to speaker recognition

Join an IEEE Society