MIXED KNOWLEDGE RELATION TRANSFORMER FOR IMAGE CAPTIONING
Tianyu Chen, Zhixin Li, Jiahui Wei, Tiantao Xian
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:08:23
Internal relationship of image objects has contributed significantly to the development of image captioning, especially when combined with Transformer architecture. Most of these methods only calculate the relationship between entities and ignore the information between entities and background. Besides, the way of exploring the relational information inside the image can also be extended. In this paper, we continually explore the relationship between objects from both internal and external perspectives, and embed the vital image global information into the internal relationship module. To validate the effectiveness of our model, we conduct extensive experiments on the most popular MSCOCO dataset, and achieve state-of-the-art performance on both online and offline test sets.