Attend, Correct And Focus: A Bidirectional Correct Attention Network For Image-Text Matching
Yang Liu, Huaqiu Wang, Fanyang Meng, Mengyuan Liu, Hong Liu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:06:49
Image-text matching task aims to learn the fine-grained correspondences between images and sentences. Existing methods use attention mechanism to learn the correspondences by attending to all fragments without considering the relationship between fragments and global semantics, which inevitably lead to semantic misalignment among irrelevant fragments. To this end, we propose a Bidirectional Correct Attention Network (BCAN), which leverages global similarities and local similarities to reassign the attention weight, to avoid such semantic misalignment. Specifically, we introduce a global correct unit to correct the attention focused on relevant fragments in irrelevant semantics. A local correct unit is used to correct the attention focused on irrelevant fragments in relevant semantics. Experiments on Flickr30K and MSCOCO datasets verify the effectiveness of our proposed BCAN by outperforming both previous attention-based methods and state-of-the-art methods. Code can be found at: https://github.com/liuyyy111/BCAN.