CONTENT-INSENSITIVE DYNAMIC LIP FEATURE EXTRACTION FOR VISUAL SPEAKER AUTHENTICATION AGAINST DEEPFAKE ATTACKS

Zihao Guo (Shanghai Jiao Tong University); shilin wang (SEIEE, Shanghai Jiaotong University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Recent research has shown that lip-based speaker authentication system can achieve good authentication performance. However, with emerging deepfake technology, attackers can make high fidelity talking videos of a user, thus posing a great threat to these systems. Confronted with this threat, we propose a new deep neural network for lip-based visual speaker authentication against human imposters and deepfake attacks. One dynamic enhanced block with context modeling scheme is designed to capture a user’s unique talking habit by learning from his/her lip movement. Meanwhile, a cross-modality content-guided loss is designed to help extract discriminative features when learning from different lip movement of a user uttering different content. This loss makes the proposed method insensitive to content variation. Experiments on the GRID dataset show that the proposed method not only outperforms three state-of-the-art methods but also simplifies the training process and reduces the training cost.

Tags:

biometrics

CONTENT-INSENSITIVE DYNAMIC LIP FEATURE EXTRACTION FOR VISUAL SPEAKER AUTHENTICATION AGAINST DEEPFAKE ATTACKS

Zihao Guo (Shanghai Jiao Tong University); shilin wang (SEIEE, Shanghai Jiaotong University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Keynote: Biometrics and Behavior for Information Forensics and Learning Assessment in Online Education

Single Domain Dynamic Generalization for Iris Presentation Attack Detection

Learning Expressive and Generalizable Motion Features for Face Forgery Detection

Join an IEEE Society