Skip to main content

Exploring Dual Stream Global Information for Image Captioning

Tiantao Xian, Zhixin Li, Tianyu Chen, Huifang Ma

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:05:18
13 May 2022

In recent years, image caption methods based on the encoder-decoder framework have made promising achievements, but most of them lack the exploitation of global information. In general, visual global information can provide more fine-grain details for recognizing small objects. On the other hand, the textual global information provides a coarse understanding of the visual scene. In this paper, we propose Dual Global Enhanced Transformer (DGET) to explicitly utilize both visual and textual global information. In encoding stages, we complement two visual features with different properties to obtain a global enhanced visual representation by a novel Global Enhanced Encoder (GEE). During decoding, we proposed Global Enhanced Decoder (GED) to utilize the textual global information explicitly. To validate our model, we conduct extensive experiments on the COCO image captioning dataset and achieve superior performance over many state-of-the-art methods.

More Like This

  • CIS
    Members: Free
    IEEE Members: Free
    Non-members: Free
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00