OPTIMIZING TRANSFORMER FOR LARGE-HOLE IMAGE INPAINTING

Zixuan Li, Yuan-Gen Wang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Lecture 11 Oct 2023

In recent years, leveraging Convolutional Neural Network (CNN) to optimize Transformer (called hybrid model) has achieved great progress in image inpainting. However, the slow growth of the effective receptive field of CNN in processing large-hole regions significantly limits the overall performance. To alleviate this problem, this paper proposes a new Transformer-CNN-based hybrid framework (termed PUT+) by introducing the fast Fourier convolution (FFC) into the CNN-based refinement network. The proposed framework introduces an improved Patch-based Vector Quantized Variational Auto-Encoder (P-VQVAE+). The encoder transforms the masked region into non-overlapping patch-based unquantized feature vectors as the input of Un-Quantized Transformer (UQ-Transformer). The decoder restores the masked region from the predicted quantized features output by the UQ-Transformer while maintaining the unmasked region unchanged. Many experimental results show that the proposed method outperforms the state-of-the-art by a large margin, especially for image inpainting with large masked areas.

Tags:

image inpainting

transformer

fast Fourier convolution

Receptive Field

information loss

OPTIMIZING TRANSFORMER FOR LARGE-HOLE IMAGE INPAINTING

Zixuan Li, Yuan-Gen Wang

More Like This

Devising Transformers as an Autoencoder for Unsupervised Multivariate Time Series Imputation

Slides: Devising Transformers as an Autoencoder for Unsupervised Multivariate Time Series Imputation

TSFC: TEXTURE AND STRUCTURE FEATURES COUPLING FOR IMAGE INPAINTING

Join an IEEE Society