MULTI-FEATURE INTEGRATION FOR SPEAKER EMBEDDING EXTRACTION

Sreekanth Sankala, Shaik Mohammad Rafi B, Sri Rama Murty K

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:12:59

12 May 2022

The performance of the automatic speaker recognition system is becoming more and more accurate, with the advancement in deep learning methods. However, current speaker recognition system performances are subjective to the training conditions, thereby decreasing the performance drastically even on slightly varied test data. A lot of methods such as using various data augmentation structures, various loss functions, and integrating multiple features systems have been proposed and shown a performance improvement. This work focuses on integrating multiple features to improve speaker verification performance. Speaker information is commonly represented in the different kinds of features, where the redundant and irrelevant information such as noise and channel information will affect the dimensions of different features in a different manner. In this work, we intend to maximize the speaker information by reconstructing the extracted speaker information in one feature from the other features while at the same time minimizing the irrelevant information. The experiments with the multi-feature integration model demonstrate improved performance than the stand-alone models by significant margins. Also, the extracted speaker embeddings are found to be noise-robust.

Tags:

self supervised learning

feature fusion

deep speaker embeddings.

speaker verification