Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
Poster 09 Oct 2023

The latest methods for saliency prediction on 360° images show that better results can be obtained using equirectangular (ERP) images as input. Due to the limitation of the receptive field, existing convolution-based networks cannot capture long-range information in complex 360° images. Although the transformer has the innate ability to capture long-range correlations with self-attention, large dataset requirement limit its application in saliency prediction of 360° images. In this paper, we present a novel Multi-scale Transformer framework for Saliency prediction on 360° images (MTSal360). The Multi-scale Transformer Module (MTM) is designed in the network to aggregate the contextual long-range information, which includes a Convolutional Positional Encoder (CPE) to enable the model could train and test on cubic and ERP format separately to address the insufficient data. Experiments on two public datasets illustrate that MTSal360 achieves better results over the state-of-the-art methods.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • Sponsoring Society
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00