MULTI-TASK LEARNING IMPROVES SYNTHETIC SPEECH DETECTION

Yichuan Mo, Shilin Wang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:06:56

08 May 2022

With the development of deep learning, synthetic speech has become more and more realistic and easier to spoof Automatic Speaker Verification (ASV) devices. Based on mining more effective hand-crafted features and proposing more powerful networks, many algorithms have been proposed to detect this malicious attack. In this paper, by observing that deepening the network impairs the performance of the network in detecting unknown attacks, we propose that the synthetic speech detection problem is an out-of-distribution (OOD) generalization problem and we enhance the robustness of networks by using multi-task learning. In our system, three auxiliary tasks are used to assist synthetic speech detection: bonafide speech reconstruction, spoofing voice conversion and speaker classification. Experimental results show that our approach can be applied to multiple architectures and can significantly improve the performance on both known attacks (development set) and unknown attacks (evaluation set). In addition, our best-performing network is quite competitive to recent state-of-the-art (SOTA) systems. It demonstrates the potential application of multi-task learning in synthetic speech detection.

Tags:

synthetic speech detection

speech anti-spoofing

multi-task learning

adversarial learning

MULTI-TASK LEARNING IMPROVES SYNTHETIC SPEECH DETECTION

Yichuan Mo, Shilin Wang

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

HIERARCHICAL MULTI-TASK LEARNING VIA TASK AFFINITY GROUPINGS

Document binarization with Multi-branch Gated Convolutional Generative Adversarial Networks

INFRARED SMALL TARGET DETECTION BASED ON SALIENCY GUIDED MULTI-TASK LEARNING

Join an IEEE Society