STRUCTURE-AWARE GENERATIVE ADVERSARIAL NETWORK FOR TEXT-TO-IMAGE GENERATION

Wenjie Chen, Zhangkai Ni, Hanli Wang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Lecture 10 Oct 2023

Text-to-image generation aims at synthesizing photo-realistic images from textual descriptions. Existing methods typically align images with the corresponding texts in a joint semantic space. However, the presence of the modality gap in the joint semantic space leads to misalignment. Meanwhile, the limited receptive field of the convolutional neural network leads to structural distortions of generated images. In this work, a structure-aware generative adversarial network (SaGAN) is proposed for (1) semantically aligning multimodel features in the joint semantic space in a learnable manner; and (2) improving the structure and contour of generated images by the designed content-invariant negative samples. Compared with the state-of-the-art models, experimental results show that SaGAN achieves over 30.1% and 8.2% improvements in terms of FID on CUB and COCO datasets, respectively.

Tags:

Text-to-image generation

generative adversarial network

negative data augmentation