Multi-Feature Learning With Canonical Correlation Analysis Constraint For Text-Independent Speaker Verification
Zheng Li, Miao Zhao, Lin Li, Qingyang Hong
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 0:12:42
In order to improve the performance and robustness of text-independent speaker verification systems, various speaker embedding representation learning algorithms have been developed. Typically, exploring manifold kinds of features to describe speaker-related embeddings is a common approach, such as introducing more acoustic features or different resolution scale features. In this paper, a new multi-feature learning strategy with canonical correlation analysis (CCA) constraint is proposed to learn the instinct speaker embeddings, which maximizes the correlation between two features from the same utterance. Based on the multi-feature learning structure, the CCA constraint layer and the CCA loss are utilized to explore the correlation representation between two kinds of features and alleviate the redundancy. Two multi-feature learning strategies are studied, using the pairwise acoustic features, and the pair of short-term and long-term features. Furthermore, we improve the long short-term feature learning structure by replacing the LSTM block with the Bidirectional-GRU (B-GRU) block and introducing more dense layers. The effectiveness of these improvements are shown on the VoxCeleb 1 evaluation set, the noisy VoxCeleb 1 evaluation set and the SITW evaluation set.