基于局部风格融合的对抗补丁隐蔽性增强方法

谢喜龙; 郭桐; 肖利民; 韩萌; 徐向荣; 董进; 王良

引用本文：

谢喜龙,郭桐,肖利民,韩萌,徐向荣,董进,王良.基于局部风格融合的对抗补丁隐蔽性增强方法[J].信息安全学报,已采用 [点击复制]
xie xilong,guo tong,xiao limin,han meng,xu xiangrong,dong jin,wang liang.Adversarial Patch Steganography Enhancement through Localized Style Fusion[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 2848次下载 0次
基于局部风格融合的对抗补丁隐蔽性增强方法
谢喜龙¹, 郭桐¹, 肖利民¹, 韩萌¹, 徐向荣¹, 董进², 王良¹
0 字体:加大+\|默认\|缩小-
(1.北京航空航天大学;2.北京微芯区块链与边缘计算研究院)

摘要:

自从对抗样本这一概念被提出以来，各种针对深度学习模型的对抗攻击方法引发了一系列安全性问题。其中，对抗补丁通过在输入样本中引入特定的补丁，使深度学习模型产生误导性的结果，给当前的深度学习系统带来了巨大的安全隐患。然而，目前的对抗补丁生成方法在提升对抗补丁的隐蔽性方面仍存在一定的局限性，存在与周围环境差异较大容易被人类察觉的问题。针对这一问题，本文提出了一种基于局部风格融合的对抗补丁隐蔽性增强方法。该方法首先基于多模型加权类激活映射寻找图像的脆弱区域，精准定位对抗补丁的安放位置，提高对抗补丁的攻击性。然后利用风格迁移技术，计算目标图像与对抗补丁的风格矩阵与内容矩阵。在对抗补丁生成过程中，综合考虑分类损失、风格损失、内容损失与边界损失，通过余弦距离函数调整对抗补丁的生成风格与生成内容，使对抗补丁与其覆盖的局部图像的风格与内容相融合，从而融入周边环境，在色彩、风格上协调统一，降低对抗补丁在人类视觉上的可察觉性，达到增强对抗补丁隐蔽性的目的。本文分别从攻击性和隐蔽性对生成的补丁进行实验评估，实验结果表明，此方法可生成兼具隐蔽性与攻击性的对抗补丁，在人类无法察觉的同时实现攻击。

关键词: 对抗补丁风格融合生成模型类激活映射

DOI：

投稿时间：2023-11-22修订日期：2024-03-20

基金项目:国家重点研发计划，国家自然科学基金项目（面上项目，重点项目，重大项目）

Adversarial Patch Steganography Enhancement through Localized Style Fusion

xie xilong¹, guo tong¹, xiao limin¹, han meng¹, xu xiangrong¹, dong jin², wang liang¹

(1.Beihang University;2.Beijing Academy of Blockchain and Edge Computing)

Abstract:

Since the concept of adversarial examples was introduced, various adversarial attack methods targeting deep learning models have raised a series of security issues. Among them, adversarial patches introduce specific patches into input sam-ples to cause deep learning models to produce misleading results, posing significant security risks to current deep learning systems. However, current adversarial patch generation methods still have certain limitations in enhancing the conceal-ment of adversarial patches, as they are prone to being detected by humans due to significant differences from the sur-rounding environment. In response to this problem, this paper proposes a method for enhancing the concealment of ad-versarial patches based on local style fusion. The method first searches for the vulnerable regions of the image based on multi-model weighted class activation mapping, and accurately locates the placement of adversarial patches, which im-proves the aggressiveness of adversarial patches. Then the style migration technique is utilized to compute the style matrix and content matrix of the target image and the adversarial patch. During the adversarial patch generation process, a comprehensive approach takes into account not only classification loss but also style loss, content loss, and boundary loss. By utilizing the cosine distance function, adjustments are made to both the style and content of the generated adversarial patch. This intricate adjustment ensures that the adversarial patch seamlessly blends its style and content with the local image it overlays, effectively integrating it into the surrounding environment. This harmonization guarantees consistency in color and style, ultimately minimizing the visibility of the adversarial patch to the human eye. The ultimate goal is to significantly enhance the concealment of the adversarial patch. In this paper, the generated patches are experimentally evaluated in terms of aggressiveness and covertness, respectively, and the experimental results show that this method can generate adversarial patches with both covertness and aggressiveness, which can realize the attack while being undetecta-ble to human beings.

Key words: adversarial patch style fuse generative model class activation mapping