Bw-Eda-Eend: Streaming End-To-End Neural Speaker Diarization For A Variable Number Of Speakers

Eunjung Han, Chul Lee, Andreas Stocke

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:11:20

11 Jun 2021

We present a novel online end-to-end neural diarization system, BW-EDA-EEND, that processes data incrementally for a variable number of speakers. The system is based on the EDA architecture of Horiguchi et al., but utilizes the incremental Transformer encoder, attending only to its left contexts and using block-level recurrence in the hidden states to carry information from block to block, making the algorithm complexity linear in time. We propose two variants of it. For unlimited-latency BW-EDA-EEND which processes in-puts linear in time, we show only moderate degradation for up to two speakers using a context size of 10 seconds compared to offline EDA-EEND. With more than two speakers, the accuracy gap between online and offline grows, but it still outperforms a baseline offline clustering diarization system for one to four speakers with unlimited context size, and shows comparable accuracy with context size of 10 seconds. For limited-latency BW-EDA-EEND, which produces diarization outputs block-by-block as audio arrives, we show accuracy comparable to the offline clustering-based system.

Chairs:

Man-Wai Mak

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021

Bw-Eda-Eend: Streaming End-To-End Neural Speaker Diarization For A Variable Number Of Speakers

Eunjung Han, Chul Lee, Andreas Stocke

Value-Added Bundle(s) Including this Product

ICASSP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Anomaly Detection Via Context And Local Feature Matching

Brain Tumor Sequence Registration Challenge (Brats-Reg): Establishing Correspondence Between Pre-Operative And Follow-Up MRI

Generation Of 12-Lead Electrocardiogram With SubjeCT-Specific, Image-Derived Characteristics Using A Conditional Variational Autoencoder

Join an IEEE Society