Skip to main content

Short-segment speaker verification using ECAPA-TDNN with multi-resolution encoder

Sangwook Han (GIST); Youngdo Ahn (GIST); Kyeognmuk Kang (GIST); Jong Won Shin (Gwangju Institute of Science and Technology)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

Time-domain approaches have shown the potential to improve the performance of speaker verification, but still predominant approaches utilize hand-crafted features such as the mel filterbank energies. Although these features are based on speech perception models and exhibited impressive performances, the fixed frame size does not allow good temporal and spectral resolutions at the same time and there is information loss when taking the magnitude spectrum and during frequency rescaling. In this paper, we propose to incorporate multi-resolution time-domain information into the ECAPA-TDNN speaker verification system. We construct a multi-resolution encoder to extract multiple features in different temporal resolutions, and let the extracted features drive the adapter modules. Experimental results showed that the proposed method outperformed other recently proposed approaches when the input length was 2 seconds or shorter for the VoxCeleb dataset. The proposed approach also showed superior performance on the Google Speech Commands dataset v2.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00