Skip to main content

Truncated Lottery Ticket For Deep Pruning

Iraj Saniee, Lisa Zhang, Bradley Magnetta

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:14:56
07 Oct 2022

Recently, the application of Transformer in computer vision has shown us the potential of this new paradigm. However, standard Multi-head Attention (MSA) faces an explosion of computational cost as the input changes from a sequence of text to an image, and MSA is computationally redundant for images. in this paper, we propose a new backbone network combining window-based attention and convolutional neural networks named ConMW Transformer, introducing convolution into the Transformer to help it converge quickly and improve accuracy. ConMW Transformer use a hierarchical architecture, an inductive bias is incorporated during tokenization and feature projection. We reduce the computational cost by performing the attention operation within windows after partition the feature map, while allowing connections between multiple heads for a more appropriate joint representation. We also use large kernel convolution after the window-based attention to merge features between windows, which help maintaining the superiority of attention in global context modelling. With only ImageNet-1K pre-training using 224 ? 224 resolution, our base model can achieve 83.7% top-1 accuracy on ImageNet-1K and 49.9 mIoU for semantic segmentation on ADE20K.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00