Model-Agnostic Adversarial Example Detection Through Logit Distribution Learning
Yaopeng Wang, Lehui Xie, Ximeng Liu, Jia-Li Yin, Tingjie Zheng
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:08:15
Recent research on vision-based tasks has achieved great improvement due to the development of deep learning solutions. However, deep models have been found vulnerable to adversarial attacks where the original inputs are maliciously manipulated and cause dramatic shifts to the outputs. In this paper, we focus on adversarial attacks in image classifiers built with deep neural networks and propose a model-agnostic approach to detect adversarial inputs. We argue that the logit semantics of adversarial inputs follow a different evolution with respect to original inputs, and construct a logits-based embedding of features for effective representation learning. We train an LSTM network to further analyze the sequence of logits-based features to detect adversarial examples. Experimental results on the MNIST, CIFAR-10, and CIFAR-100 datasets show that our method achieves state-of-the-art accuracy for detecting adversarial examples and has strong generalizability.