基于协方差矩阵自适应进化策略的图像对抗扰动分布学习

曹晗; 孙钦东; 唐扬; 曾颖明; 马辉; 刘雁孝; 耿荣

引用本文：

曹晗,孙钦东,唐扬,曾颖明,马辉,刘雁孝,耿荣.基于协方差矩阵自适应进化策略的图像对抗扰动分布学习[J].信息安全学报,已采用 [点击复制]
cao han,sun qin dong,tang yang,zeng ying ming,ma hui,liu yan xiao,geng rong.Adversarial Perturbation Distribution Learning Based on Covariance Matrix Adaptation Evolution Strategy[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 2次下载 0次
基于协方差矩阵自适应进化策略的图像对抗扰动分布学习

0 字体:加大+\|默认\|缩小-
(1.西安理工大学;2.西安交通大学;3.解放军信息安全测评认证中心;4.北京计算机技术及应用研究所;5.北京金风慧能技术有限公司)

摘要:

大量研究表明深度神经网络模型易遭受对抗样本的恶意攻击，更深层次的应用面临严峻的安全挑战。在真实的攻击场景下，由于目标模型的不可知性与访问限制等约束，黑盒攻击更加符合实际需求。但现有黑盒对抗攻击算法在高维图像数据集上制作对抗样本时，存在查询效率低和攻击成本高的缺陷。针对此问题，本文基于对抗扰动在低频子空间的高密度现象，提出了一种同时支持L2与L?范数扰动约束的无梯度黑盒攻击方法。该方法采用协方差矩阵自适应进化策略在低频域学习对抗扰动的多维高斯分布，并进一步将多维高斯分布的协方差矩阵设置为对角矩阵，降低算法的复杂度。在迭代攻击的过程中，以历史采样不断更新分布参数，学习更优的对抗扰动概率密度分布，最终以高几率在低频子空间采样有效扰动并将其映射至空域得到对抗样本，大幅降低对目标模型的访问次数与黑盒攻击的算力成本。本文在主流的深度学习模型上与先进的黑盒对抗攻击方法进行了对比，在L2范数约束下，算法保持高攻击成功率的同时，平均查询较基准方法最优结果最高降低了24.42%。在相同的L?范数约束下，平均查询较基准方法最优结果最高降低了41.54%，攻击成功率最高提升了13.7%。实验结果表明，在相同的扰动范数约束与查询次数限制下，本文方法能够在保证高攻击成功率的同时有效降低对目标模型的查询次数。

关键词: 对抗样本黑盒攻击进化策略深度学习神经网络

DOI：

投稿时间：2022-08-25修订日期：2023-03-03

基金项目:国家自然科学基金项目（面上项目，重点项目，重大项目）

Adversarial Perturbation Distribution Learning Based on Covariance Matrix Adaptation Evolution Strategy

cao han¹, sun qin dong², tang yang^{3,4,4,4,4,4,4,5}, zeng ying ming^6,4,4,7, ma hui⁸, liu yan xiao¹, geng rong¹

(1.Xi’an University of Technology;2.Xi’an Jiaotong University;3.PLA Information Security Testing Evaluation &4.amp;5.Certification Center;6.Beijing Institute of Computer Technology &7.Applications;8.Beijing Goldwind Smart Energy Technology Co., Ltd.)

Abstract:

A large number of studies show that the deep neural network models are vulnerable to malicious attacks by adversarial samples, and the deeper application faces severe security challenges. In a real attack scenario, due to the unknowability of the target model and access restrictions, black-box attacks are more in line with the actual needs. However, the existing black-box adversarial attacks have the disadvantages of low query efficiency and high attack cost when crafting adver-sarial samples on high-dimensional image datasets. To solve this problem, this paper proposes a gradient-free black-box attack method that supports both L2-norm and L?-norm perturbation constraints based on the high-density phenomenon of adversarial perturbations in low frequency subspace. This method uses the Covariance Matrix Adaptation Evolution Strategy to learn the multi-dimensional Gaussian distribution against perturbation in the low frequency domain, and further sets the covariance matrix of the multi-dimensional Gaussian distribution as the diagonal matrix to reduce the complexity of the algorithm. In the process of iterative attack, the distribution parameters are continuously updated by historical sampling to learn better probability density distribution of adversarial perturbations. Finally, the effective perturbation is sampled in the low frequency subspace with a high probability and mapped to the space domain to obtain an adversarial sample, which greatly reduces the number of visits to the target model and the computational cost of black-box attack. This paper compares with the baseline black-box attack methods on the mainstream deep learning model. While the algorithm proposed maintains high success rate, the average queries are reduced by 24.42% compared with the opti-mal results of baseline methods for L2-norm constraint attack. Under the same L?-norm constraint, the average queries are reduced by 41.54% and the attack success rate is increased by 13.7% compared with the optimal result of the baseline method. The experimental results show that under the same norm constraint and query times constraint, the method in this paper can effectively reduce the query times of the target model while ensuring a high attack success rate.

Key words: adversarial sample black-box attack evolution strategy deep learning neural network