基于迭代自编码器的深度学习对抗样本防御方案

杨浚宇

本文已被：浏览 9635次下载 9769次	码上扫一扫！
基于迭代自编码器的深度学习对抗样本防御方案
杨浚宇
分享到：微信更多字体:加大+\|默认\|缩小-
(上海微系统与信息技术研究所, 上海中国 200050;上海科技大学信息学院, 上海中国 201210;中国科学院大学, 北京中国 10002)

摘要:

近年来，深度学习在计算机视觉领域表现出优异的性能，然而研究者们却发现深度学习系统并不具备良好的鲁棒性，对深度学习系统的输入添加少许的人类无法察觉的干扰就能导致深度学习模型失效，这些使模型失效的样本被研究者们称为对抗样本。我们提出迭代自编码器，一种全新的防御对抗样本方案，其原理是把远离流形的对抗样本推回到流形周围。我们先把输入送给迭代自编码器，然后将重构后的输出送给分类器分类。在正常样本上，经过迭代自编码器的样本分类准确率和正常样本分类准确率类似，不会显著降低深度学习模型的性能；对于对抗样本，我们的实验表明，即使使用最先进的攻击方案，我们的防御方案仍然拥有较高的分类准确率和较低的攻击成功率。

关键词: 对抗样本自编码器深度学习图像分类

DOI：10.19363/J.cnki.cn10-1380/tn.2019.11.03

投稿时间：2019-01-06修订日期：2019-03-20

基金项目:

IDAE: Iterative Denoising Autoencoder based Deep Learning Model Enhancement Mechanism against Adversarial Examples

YANG Junyu

Shanghai Institute of Microsystem and Information Technology, Shanghai 200050, China;School of Information Science and Technology of ShanghaiTech University, Shanghai 201210, China;University of Chinese Academy of Sciences, Beijing 100029, China

Abstract:

Nowadays, in computer vision area, deep learning has shown impressive performance. However, researchers found that deep learning systems are not robust enough. Deep learning models will fail when attackers add some specially crafted perturbations, which are imperceptible to humans. These examples that cause the model fail to work are named adversarial examples by researches. We propose a new defense mechanism named iterative denoising autoencoder(IDAE). The intuition behand IDAE is that we iteratively push examples that far away from manifold onto the manifold. We apply IDAE to test examples and then send the reconstructed examples to the classifier. We show that, for normal examples, the reconstructed examples after IDAE have classification accuracy comparable to their original versions, suggesting that the reconstructed examples are on the manifold and will not decrease the performance of model. For adversarial examples, we show that this defense achieved high classification accuracy and low attack success rate on the state of the art attacks in both grey-box and white-box attacks.

Key words: adversarial examples autoencoder deep learning image classification