一种检测C&W对抗样本图像的盲取证算法

邓康; 罗盛海; 彭安杰; 曾辉; 黄晓芳

本文已被：浏览 7357次下载 7338次	码上扫一扫！
一种检测C&W对抗样本图像的盲取证算法
邓康,罗盛海,彭安杰,曾辉,黄晓芳
分享到：微信更多字体:加大+\|默认\|缩小-
(西南科技大学计算机科学与技术学院绵阳中国 621010;西南科技大学计算机科学与技术学院绵阳中国 621010;中山大学广东省信息安全重点实验室广州中国 510275)

摘要:

对抗样本图像能欺骗深度学习网络，亟待对抗样本防御机制以增强深度学习模型的安全性。C&W攻击是目前较热门的一种白盒攻击算法，它产生的对抗样本具有图像质量高、可转移、攻击性强、难防御等特点。本文以C&W攻击生成的对抗样本为研究对象，采用数字图像取证的思路，力图实现C&W对抗样本的检测，拒绝对抗样本输入深度学习网络。基于对抗样本中的对抗扰动易被破坏的假设，我们设计了基于FFDNet滤波器的检测算法。具体来说，FFDNet是一种基于深度卷积网络CNN的平滑滤波器，它能破坏对抗扰动，导致深度学习模型对对抗样本滤波前后的输出不一致。我们判断输出不一致的待测图像为C&W对抗样本。我们在ImageNet-1000图像库上针对经典的ResNet深度网络生成了6种C&W对抗样本。实验结果表明本文方法能较好地检测C&W对抗样本。相较于已有工作，本文方法不仅极大地降低了虚警率，而且提升了C&W对抗样本的检测准确率。

关键词: 深度学习对抗样本数字图像取证图像滤波

DOI：10.19363/J.cnki.cn10-1380/tn.2020.11.01

投稿时间：2019-12-31修订日期：2020-04-03

基金项目:本课题得到国家自然科学基金（No.61702429），四川省科技厅基金（No.19yyjc1656），四川省教育厅基金（No.17ZB0450）资助。

Blind forensics of adversarial images generated by C&W algorithm

DENG Kang,LUO Shenghai,PENG Anjie,ZENG Hui,HUANG Xiaofang

School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang 621010, China;School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang 621010, China;Guangdong Key Laboratory of Information Security Technology, Sun Yat-Sen University, Guangzhou 510275, China

Abstract:

Adversarial images which can fool Deep neural networks have attracted researchers to focus on how to harden DNNs against adversarial attacks. Among typical attack algorithms, the C&W attack is one of the strongest attacks, which ensures the attack success rates yet causes less adversarial perturbations on the original image, and is taken as a benchmark in defense attempts. In this paper, we employ the blind forensic methodology to detect C&W adversarial images, which aims to avoid adversarial inputs for deep neural networks. Supposing that the adversarial perturbations are easily damaged by some image processing operations, we proposed a detecting method by using the fast and flexible de-noising convolution neural network called FFDNet. Specially, we compare the model’s prediction on the test image and its filtered version. If the original and filtered inputs produce substantially different outputs from the model, the test image is likely to be adversarial. We employ ResNet as the targeted network, and generate 6 kinds of C&W adversarial images on ImageNet-1000 database. Experimental results show that the proposed method is effective in the detection of C&W adversarial images, and outperforms state-of-the-arts in terms of false positive rates and true positive rates.

Key words: deep learning adversarial images digital image forensics image filtering