IMPROVING SEPARATION-BASED SPEAKER DIARIZATION VIA ITERATIVE MODEL REFINEMENT AND SPEAKER EMBEDDING BASED POST-PROCESSING
Shu-Tong Niu, Jun Du, Lei Sun, Chin-Hui Lee
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:14:56
In this paper, we propose an iterative separation-based speaker diarization (ISSD) approach to cope with the realistic data conditions. In the proposed ISSD, we iteratively generate adaptation data according to speaker priors and fine-tune the separation model, which leads to a gradual performance improvement. To further reduce some unavoidable speaker detection errors due to some undesirable prior errors using simple ISSD, we utilize speaker embedding information and propose two post-processing techniques, namely, speaker filtering and speaker recovery. We evaluate the diarization performance on the two-speaker conversational telephone speech (CTS) data set from DIHARD-III Challenge. When compared to state-of-the-art clustering-based speaker diarization (CSD) system, the proposed ISSD approach combined with the two post-processing schemes yields a 47.72% and 46.97% relative diarization error rate reduction on the development and evaluation sets, respectively. ISSD is also one key contributing factor to the best-performing system in DIHARD-III Challenge.