PFTA-Net: Progressive Feature Alignment and Temporal Attention Fusion Networks for Video Inpainting
Yanni Zhang, Zhiliang Wu, Yan Yan
-
SPS
IEEE Members: $11.00
Non-members: $15.00
The goal of video inpainting is to fill in missing regions with reasonable and coherent content in a video sequence. Due to the motion of cameras and objects, the reference frame and the target frame are not aligned, and the useful information of the reference frame cannot be well utilized. Therefore, temporal alignment plays an important role in video inpainting. Some studies have attempted to divide the remote alignment into multiple sub-alignments and process them step by step, but error accumulation is inevitable. In this paper, we present a novel progressive feature alignment and temporal attention fusion network, namely PFTA-Net. Specifically, we design a progressive feature alignment module, which employs sub-alignments with a progressive refinement scheme, resulting in more accurate motion compensation. After alignment, we propose a temporal attention fusion module, which computes temporal attention weights for each aligned reference frame feature, resulting in modulated features for reconstructing the target frame. Our extensive evaluations, including both quantitative and qualitative assessments, demonstrate the better performance and efficacy of our video inpainting network.