基于局部风格融合的对抗补丁隐蔽性增强方法

谢喜龙; 郭桐; 肖利民; 韩萌; 徐向荣; 董进; 王良

本文已被：浏览 20次下载 13次	码上扫一扫！
基于局部风格融合的对抗补丁隐蔽性增强方法
谢喜龙,郭桐,肖利民,韩萌,徐向荣,董进,王良
分享到：微信更多字体:加大+\|默认\|缩小-
(北京航空航天大学, 软件开发环境国家重点实验室北京中国 100191;北京航空航天大学计算机学院北京中国 100191;北京微芯区块链与边缘计算研究院北京中国 100190)

摘要:

自从对抗样本这一概念被提出以来, 各种针对深度学习模型的对抗攻击方法引发了一系列安全性问题。其中, 对抗补丁通过在输入样本中引入特定的补丁, 使深度学习模型产生误导性的结果, 给当前的深度学习系统带来了巨大的安全隐患。然而, 目前的对抗补丁生成方法在提升对抗补丁的隐蔽性方面仍存在一定的局限性, 存在与周围环境差异较大容易被人类察觉的问题。针对这一问题, 本文提出了一种基于局部风格融合的对抗补丁隐蔽性增强方法。该方法首先基于多模型加权类激活映射寻找图像的脆弱区域, 精准定位对抗补丁的安放位置, 提高对抗补丁的攻击性。然后利用风格迁移技术, 计算目标图像与对抗补丁的风格矩阵与内容矩阵。在对抗补丁生成过程中, 综合考虑分类损失、风格损失、内容损失与边界损失, 通过余弦距离函数调整对抗补丁的生成风格与生成内容, 使对抗补丁与其覆盖的局部图像的风格与内容相融合, 从而融入周边环境, 在色彩、风格上协调统一, 降低对抗补丁在人类视觉上的可察觉性, 达到增强对抗补丁隐蔽性的目的。本文分别从攻击性和隐蔽性对生成的补丁进行实验评估, 实验结果表明, 此方法可生成兼具隐蔽性与攻击性的对抗补丁, 在人类无法察觉的同时实现攻击。

关键词: 对抗补丁风格融合生成模型类激活映射

DOI：10.19363/J.cnki.cn10-1380/tn.2025.09.05

投稿时间：2023-11-22修订日期：2024-03-20

基金项目:本课题得到科技创新 2030 新一代人工智能重大项目(No. 2022ZD0117602); 国家自然科学基金项目(No. 62272026, No. 62104014)资助。

Adversarial Patch Steganography Enhancement through Localized Style Fusion

XIE Xilong,GUO Tong,XIAO Limin,HAN Meng,XU Xiangrong,DONG Jin,WANG Liang

State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China;School of Computer Science and Engineering, Beihang University, Beijing 100191, China;Beijing Academy of Blockchain and Edge Computing, Beijing 100190, China

Abstract:

Since the concept of adversarial examples was introduced, various adversarial attack methods targeting deep learning models have raised a series of security issues. Among them, adversarial patches introduce specific patches into input samples to cause deep learning models to produce misleading results, posing significant security risks to current deep learning systems. However, current adversarial patch generation methods still have certain limitations in enhancing the concealment of adversarial patches, as they are prone to being detected by humans due to significant differences from the surrounding environment. In response to this problem, this paper proposes a method for enhancing the concealment of adversarial patches based on local style fusion. The method first searches for the vulnerable regions of the image based on multi-model weighted class activation mapping, and accurately locates the placement of adversarial patches, which improves the aggressiveness of adversarial patches. Then the style migration technique is utilized to compute the style matrix and content matrix of the target image and the adversarial patch. During the adversarial patch generation process, a comprehensive approach takes into account not only classification loss but also style loss, content loss, and boundary loss. By utilizing the cosine distance function, adjustments are made to both the style and content of the generated adversarial patch. This intricate adjustment ensures that the adversarial patch seamlessly blends its style and content with the local image it overlays, effectively integrating it into the surrounding environment. This harmonization guarantees consistency in color and style, ultimately minimizing the visibility of the adversarial patch to the human eye. The ultimate goal is to significantly enhance the concealment of the adversarial patch. In this paper, the generated patches are experimentally evaluated in terms of aggressiveness and covertness, respectively, and the experimental results show that this method can generate adversarial patches with both covertness and aggressiveness, which can realize the attack while being undetectable to human beings.

Key words: adversarial patch style fuse generative model class activation mapping