Two Distillation Perspectives Based On Tanimoto Coefficient

Hongqiao Shu

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:14:44

07 Oct 2022

Fine-grained visual classification (FGVC) targets to accurately identify the subordinate categories from a target class. Convolutional neural network (CNN) based methods prove that the attention mechanism can enhance the representation of local regions and improve the recognition accuracy. Recently, vision transformer (ViT) has shown great application potential in image classification tasks by taking advantage of its inherent self-attention mechanism and early global information acquisition capability. However, this global information acquisition approach involves an irrelevant environment in the interaction process, which makes it difficult for fine-grained tasks that rely on local differences to quickly learn discriminant features. To this end, we propose a hybrid network termed Mask-ViT, which can effectively avoid environmental interference and express more robust features by focusing on the instance itself. Specifically, Contour Knowledge Embedding (CKE) is employed to transferred prior location information to ViT and guided the subsequent recognition. The experiments on three benchmarks demonstrate the effectiveness of the proposed method.

Tags:

International Conference on Image Processing

IEEE ICIP 2022

icip

Two Distillation Perspectives Based On Tanimoto Coefficient

Hongqiao Shu

Value-Added Bundle(s) Including this Product

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

More Like This

Earthquake Location and Magnitude Estimation With Graph Neural Networks

Automating Detection of Papilledema in Pediatric Fundus Images With Explainable Machine Learning

Revisiting The Efficiency of Ugc Video Quality Assessment

Join an IEEE Society