Multiple Points Input For Convolutional Neural Networks In Replay Attack Detection
Sung-Hyun Yoon, Ha-Jin Yu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 16:19
The models based on convolutional neural network (CNN) have shown remarkable performance in spoofing detection for automatic speaker verification. In order to input data into CNN-based models in mini-batch unit, the shape of all data in each mini-batch must be equal. Therefore, the method to make all data have the same length should be preceded because speeches have variable lengths. Segmentation is one of the methods to make the lengths of all data be equal. It divides the data into multiple segments using sliding window. Then, the models take one segment as input. However, it means that the amount of information that can be considered at one time is limited. We proposed the multiple points input method to increase the amount of information that can be considered at one time. The CNNs get input from multiple points in an utterance that are separated far enough to have different characteristics. The experimental results on ASVspoof 2019 physical access scenarios showed that our proposed method reduced the relative equal error rate by about 44% compared to the baseline.