Deep Residual Networks With Common Linear Multi-Step and Advanced Numerical Schemes

Zhengbo Luo, Weilian Zhou, Sei-ichiro Kamata, Xuehui Hu

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:13:13

07 Oct 2022

This paper presents an efficient multi-scale vision transformer, called CBPT, that capably serves as a general-purpose backbone for computer vision. A challenging issue in transformer design is that window self-attention(WSA) often limits the information transmission of each token, whereas enlarging WSA?s receptive field is very expensive to compute. To address this issue, we develop the Locally-Enhanced Window Self-attention mechanism to double the receptive field and have a similar computational complexity to the typical WSA. in addition, we also propose information-Enhanced Patch Merging, which solves the loss of information in sampling the attention map. incorporated with these designs and the Cross Block Partial connection, CBPT not only significantly surpasses Swin by +1 box AP and mask AP on COCO object detection and instance segmentation, but also has 30% fewer parameters and 35% fewer FLOPs than Swin.

Tags:

International Conference on Image Processing

IEEE ICIP 2022

icip

Deep Residual Networks With Common Linear Multi-Step and Advanced Numerical Schemes

Zhengbo Luo, Weilian Zhou, Sei-ichiro Kamata, Xuehui Hu

Value-Added Bundle(s) Including this Product

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

More Like This

Training Strategy For Limited Labeled Data By Learning From Confusion

Encoder Enabled Gan-Based Video Generators

Combining Non-Data-Adaptive Transforms For Oct Image Denoising By Iterative Basis Pursuit

Join an IEEE Society