基于子空间的可学习初始化单步对抗训练方法

胡军; 占程曦

本文已被：浏览 11次下载 8次	码上扫一扫！
基于子空间的可学习初始化单步对抗训练方法
胡军,占程曦
分享到：微信更多字体:加大+\|默认\|缩小-
(重庆邮电大学计算机科学与技术学院重庆中国 400065;重庆邮电大学网络空间大数据智能计算教育部重点实验室重庆中国 400065)

摘要:

快速对抗训练因其被证明既高效又鲁棒而受到了广泛关注, 但是快速对抗训练方法存在严重的灾难性过拟合问题, 导致快速对抗训练的模型无法抵御攻击性较强的对抗样本。子空间对抗训练使用网络模型子空间的低维特性解决了快速对抗训练的灾难性过拟合问题。然而, 现有子空间对抗训练方法使用基于随机采样的初始化方式, 使得样本初始化值并不总是很好地指向样本损失增大的方向, 导致方法的鲁棒性不强。为此, 本文提出了一种基于子空间的可学习初始化单步对抗训练方法。具体地, 该方法在使用动态线性降维法采集目标网络关键参数子空间的基础上, 另外训练了一个三层架构的简单卷积网络, 利用干净样本在目标网络子空间中的梯度信息指导卷积神经网络更新参数, 进而为每个干净样本生成与其自身梯度相关的初始化值, 从而有效克服现有方法的不足。实验结果表明, 该方法在CIFAR-10和CIFAR-100数据集上分别针对PGD-50和AA评估方法实现了53.39%和25.82%的鲁棒精度, 且该方法训练的最优模型与最终模型间仅存在0.03%～0.4%的鲁棒精度差异。同时, 该方法在不同扰动步长组合、不同采样方法组合以及不同目标网络均表现出对基线方法的鲁棒精度提升。此外, 该方法相较于标准多步对抗训练PGD-10-AT仅使用21%的训练时间, 具有显著的计算优势, 且相较于其他单步对抗训练也表现出一定的鲁棒性优势。总之, 该方法使用较少的时间成本, 实现了更好的鲁棒性和稳定性。

关键词: 深度神经网络对抗样本对抗防御单步对抗训练子空间

DOI：10.19363/J.cnki.cn10-1380/tn.2025.09.09

投稿时间：2023-12-18修订日期：2024-03-07

基金项目:本课题得到国家重点实验室稳定支持基金(No. JBS252800290)资助。

Single-Step Adversarial Training with Learnable Initialization Based on Subspace

HU Jun,ZHAN Chengxi

School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;Key Laboratory of Cyberspace Big Data Intelligent Security, Ministry of Education, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Abstract:

Fast Adversarial Training (FAT) has garnered considerable interest owing to its demonstrated efficiency and robustness. However, many fast adversarial training methodologies are susceptible to severe catastrophic overfitting, leading to models trained via fast adversarial training being vulnerable to more aggressive adversarial samples. Subspace Adversarial Training (Sub-AT) effectively mitigates the issue of catastrophic overfitting inherent in FAT by leveraging the intrinsic low-dimensional properties of the network model subspace. Nonetheless, prevailing approaches to Sub-AT rely on random sampling-based initialization, which may not consistently align the initialization values with the direction of increasing sample loss, thereby resulting in diminished robustness. To this end, this paper proposes a learnable initializing single-step adversarial training method based on subspace. Specifically, the proposed method, on the foundation of utilizing dynamic linear dimensionality reduction to gather the critical parameter subspace of the target network, additionally trains a simple convolutional network with a three-layer architecture. It employs the gradient information of clean samples within the target network subspace to guide the convolutional neural network in updating parameters, thereby generating an initialization value related to the gradient of each clean sample, effectively overcoming the limitations of existing methods. Experimental results demonstrate that the proposed method achieves robust accuracies of 53.39% and 25.82% for PGD-50 and AA evaluation methods on the CIFAR-10 and CIFAR-100 datasets, respectively, and the difference in robustness accuracy between the optimal model and the final model trained by the proposed method ranges from 0.03% to 0.4%. Moreover, it exhibits robustness improvements over the baseline method across different disturbance step combinations, sampling method combinations, and various target networks. Furthermore, it consumes merely 21% of the training duration in contrast to the conventional multi-step adversarial training PGD-10-AT, thereby yielding a notable computational advantage. Additionally, it demonstrates certain robustness benefits over alternative single-step adversarial training techniques. In summary, the proposed method attains superior robustness and stability while requiring reduced time expenditure.

Key words: deep neural networks adversarial examples adversarial defense single-step adversarial training subspace