Audio Replay Spoof Attack Detection By Joint Segment-Based Linear Filter Bank Feature Extraction And Attention-Enhanced Densenet-Bilstm Network
Lian Huang, Chi-Man Pun
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:15:14
Most automatic speaker verification (ASV) systems are vulnerable to various spoofing attacks. To address this issue, in this article, we propose a novel model based on attention-enhanced DenseNet-BiLSTM network and segment-based linear filter bank features. First, silent segments are selected from each speech signal by using a short-term zero-crossing rate and energy. If the total duration of silent segments only contains a very limited amount of data, the decaying tails will be selected instead. Second, the linear filter bank features are extracted from the selected segments in the relatively high-frequency domain. Finally, an attention-enhanced DenseNet-BiLSTM architecture which can avoid the problems of overfitting is built. To validate this model, we used two datasets, including BTAS2016 and ASVspoof2017. Experiments show that using the attention-enhanced DenseNet-BiLSTM model with the segment-based linear filter bank feature achieves the best performance. Compared with the baseline system based on constant Q cepstral coefficient and Gaussian mixture model (GMM), the proposed model can produce a relative improvement of 91.68% and 74.04% on the two data sets respectively.
Chairs:
Daniele Giacobello