Tell Your Story: Text-Driven Face Video Synthesis With High Diversity via Adversarial Learning

Xia Hou, Meng Sun, Wenfeng Song

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Lecture 09 Oct 2023

Face synthesis is a rapidly growing area of research in computer vision. Text-driven face synthesis is particularly flexible, but challenges still exist in fusing the semantics of text and images, as well as generating diverse faces. To address these challenges, we propose a cross-modality adversarial learning framework to generate highly diverse face videos that correspond to given text descriptions. We encode text and images into a common latent space and align text and image features to control the synthesis of face attributes. We have designed a novel auto-encoder with a face identity discriminator that enlarges the margin between different individuals, increasing the variety of created faces while maintaining the semantic coherence of text and images. Our proposed method has been successfully tested on the recently released Multimodal VoxCeleb dataset. Our code is public available at https://github.com/sunmeng7/TYS.git.

Tags:

Face synthesis

generative adversarial networks

face videos

text-to-face

diversity faces

Tell Your Story: Text-Driven Face Video Synthesis With High Diversity via Adversarial Learning

Xia Hou, Meng Sun, Wenfeng Song

More Like This

DENSECL: HAZE MITIGATION USING DENSE BLOCKS AND CONTRASTIVE LOSS REGULARIZATION

A CAM-enhancing Generative Person Re-ID Method based Global and Local Features

BATINET: BACKGROUND-AWARE TEXT TO IMAGE SYNTHESIS AND MANIPULATION NETWORK

Join an IEEE Society