Phase-Aware Spoof Speech Detection Based on Res2Net with Phase Network
Juntae Kim (SK Telecom); Sung Min Ban (SK Telecom)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
For automatic speaker verification systems, spoof speech detection (SSD) is an essential countermeasure. Although SSD with magnitude features in the frequency domain has shown promising results, phase information can also be useful in capturing the artefacts of certain spoofing attacks. Thus, both magnitude and phase features must be considered to ensure the ability to generalize diverse types of spoofing attacks. In this study, we discovered that the randomness difference between magnitude and phase features is large, which can interrupt the feature-level fusion via backend neural network. In this regard, we propose a phase network to reduce that difference, which makes the Res2Net-based feature-level fusion feasible. To validate our SSD system for practical environment, both known- and unknown-type SSD scenarios are considered. As a result, our SSD system delivers competitive results compared to other state-of-the-art SSD systems in all scenarios.