基于局部邻域滤波的对抗攻击检测方法

刘朝; 朱莉芳; 接标; 丁新涛

本文已被：浏览 4301次下载 3143次	码上扫一扫！
基于局部邻域滤波的对抗攻击检测方法
刘朝,朱莉芳,接标,丁新涛
分享到：微信更多字体:加大+\|默认\|缩小-
(安徽师范大学计算机与信息学院芜湖中国 241002;网络与信息安全安徽省重点实验室芜湖中国 241002)

摘要:

目前, 卷积神经网络在语音识别、图像分类、自然语言处理、语义分割等方面都取得了良好的应用成果, 是计算机应用研究最广泛的技术之一。但研究人员发现当向输入中加入特定的微小扰动时, 卷积神经网络(CNN)模型容易产生错误的预测结果, 这类含有微小扰动的图像被称为对抗样本, CNN模型易受对抗样本的攻击。对抗样本的出现可能会对安全敏感的领域带来潜在的应用威胁。已有较多的防御方法被提出, 其中许多方法对特定攻击方法具有较好的防御效果, 但由于实际应用中无法知晓攻击者采用的攻击方式, 因此提出不依赖攻击方法的通用防御策略是一个值得研究的问题。为有效地防御各类对抗攻击, 本文提出了基于局部邻域滤波的对抗攻击检测方法。首先, 通过像素间的相关性对图像进行RGB空间切割。其次将相似的图像块组成立方体。然后, 基于立方体中邻域的局部滤波进行去噪, 即: 通过邻域立方体的3个块得到邻域数据的3维标准差, 用于Wiener滤波。再将滤波后的块组映射回RGB彩色空间。最后, 将未知样本和它的滤波样本分别作为输入, 对模型的分类进行一致性检验, 如果模型对他们的分类不相同, 则该未知样本为对抗样本, 否则为良性样本。实验表明本文检测方法在不同模型中对多种攻击具备防御效果, 识别了对抗样本的输入, 且在mini-ImageNet数据集上针对C&W、DFool、PGD、TPGD、FGSM、BIM、RFGSM、MI-FGSM以及FFGSM攻击的最优检测结果分别达到0.938、0.893、0.928、0.922、0.866、0.840、0.879、0.889以及0.871, 结果表明本文方法在对抗攻击上具有鲁棒性和有效性。

关键词: 卷积神经网络对抗攻击局部邻域滤波对抗检测

DOI：10.19363/J.cnki.cn10-1380/tn.2023.11.07

投稿时间：2022-02-25修订日期：2022-08-09

基金项目:本课题得到安徽省自然科学基金面上项目(No. 1808085MF171), 以及国家自然科学基金面上项目(No. 61976006)的资助。

Adversarial Attack Detection Method Based on Local Neighborhood Filtering

LIU Chao,ZHU Lifang,JIE Biao,DING Xintao

School of Computer and Information, Anhui Normal University, Wuhu 241002, China;Anhui Provincial Key Laboratory of Network and Information Security, Wuhu 241002, China

Abstract:

Currently, convolutional neural networks (CNN) are greatly attributed in speech recognition, image classification, natural language processing, and semantic segmentation. They are widely studied in computer applications. However, CNN models are vulnerable to adversarial examples that have been crafted specifically to fool a system while being imperceptible to humans. Such images containing small perturbations are called adversarial examples. Adversarial examples may pose a potential threat to security-sensitive applications. In this study, we focus on adversarial defense on CNN models. Since attack method is usually unknown for server in practical applications, proposing a general defense method that does not depend on attack method is an interesting topic. In order to effectively defend against various types of adversarial attacks, this paper proposes an adversarial attack detection method based on local neighborhood filtering. Firstly, the input image is divided in similar regions using inter-pixel correlation after the pixel values are projected in RGB space. Secondly, every similar region is rearranged to a block based on pixel intensity, and the image blocks are formed into a cube. After obtaining the 3-dimensional standard deviation of the neighborhood data in the cube, Wiener filtering is then performed based on the local filtering in the neighborhood of the cube. After that, the filtered block set is converted into RGB space. Finally, the input and its filtered example are respectively fed to CNN model for classification. If the model classifies them to different classes, the input example is taken as an adversarial example. Otherwise, it is discriminated as a benign example. The comparison experiments show that our proposed method is effective against adversarial attacks on different models. The detection rates on the mini-ImageNet dataset against C&W, DFool, PGD, TPGD, FGSM, BIM, RFGSM, MI-FGSM, and FFGSM attacks are 0.938, 0.893, 0.928, 0.922, 0.866, 0.840, 0.879, 0.889 and 0.871, respectively. The results show our method is robust and effective against adversarial attacks.

Key words: convolutional neural networks adversarial attack local neighborhood filtering adversarial detection