Generating Annotated High-Fidelity Images Containing Multiple Coherent Objects
Bryan Cardenas Guevara, Devanshu Arya, Deepak K. Gupta
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:12:58
Recent developments related to generative models have enabled the generation of diverse and high-fidelity images. In particular, layout-to-image generation models have gained significant attention due to their capability to generate realistic and complex images containing distinct objects. These models are generally conditioned on either semantic layouts or textual descriptions. However, unlike natural images, providing auxiliary information can be extremely hard in domains such as biomedical imaging and remote sensing. In this work, we propose a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring their contextual information during the generation process. Based on a vector-quantized variational autoencoder (VQ-VAE) backbone, our model learns to preserve spatial coherency within an image as well as semantic coherency through the use of powerful autoregressive priors. An advantage of our approach is that the generated samples are accompanied by object-level annotations. The efficacy of our approach is demonstrated through application on medical imaging datasets, where we show that augmenting the training set with the samples generated by our approach improves the performance of existing models.