提高语义分割上对抗样本的迁移攻击能力

加小俊; 秦思萌; 马腾; 孟国柱; 刘洋; 操晓春

引用本文：

加小俊,秦思萌,马腾,孟国柱,刘洋,操晓春.提高语义分割上对抗样本的迁移攻击能力[J].信息安全学报,已采用 [点击复制]
Xiaojun Jia,Simeng Qin,Teng Ma,Guozhu Meng,Yang Liu,Xiaochun Cao.Improving Transferability of Adversarial Examples on Semantic Segmentation[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 40次下载 0次
提高语义分割上对抗样本的迁移攻击能力
加小俊¹, 秦思萌², 马腾³, 孟国柱⁴, 刘洋⁵, 操晓春³
0 字体:加大+\|默认\|缩小-
(1.Nanyang Technological University;2.东北大学工商管理学院;3.中山大学网络空间安全学院;4.中国科学院信息工程研究所;5.南洋理工大学计算机科学与工程学院)

摘要:

深度神经网络（Deep Neural Networks, DNNs）在许多问题和任务中取得了前沿性能表现。然而，最近的研究显示，对抗样本对深度学习构成了巨大威胁。对抗样本是一种经过精心设计的，通过在正常样本上添加几乎无法察觉的微小扰动而生成的数据样本。它可以欺骗机器学习模型使其做出错误判断。特别是，针对某一特定的白盒模型生成的对抗样本能够欺骗其他不同的黑盒模型，这种现象被认为是对抗样本的可迁移性。对图像分类上的对抗性样本的迁移能力已经得到了系统的探索，这种探索生成了在黑盒模式下的对抗性样本。然而，对语义分割上对抗性样本的迁移能力大多被忽视了。在本文中，本文提出了一种有效的两阶段对抗攻击策略以提高语义分割上对抗性样本的迁移能力，被称为TranSegPGD。具体来说，在第一阶段，输入图像中的每个像素根据其对抗属性被划分到不同的分支中。不同的分支被分配了不同的权重以优化，以提高所有像素的对抗性能能力。本文给予难以攻击像素的损失高权重，以误分类所有像素。在第二阶段，根据它们的可迁移属性（这依赖于Kullback-Leibler散度）将像素划分到不同的分支中。不同的分支被分配了不同的权重以优化，以提高对抗性样本的迁移能力。本文给予高迁移能力像素的损失高权重，以提高对抗性样本的迁移能力。在PASCAL VOC 2012和Cityscapes数据集上，通过各种分割模型进行的广泛实验，以证明所提方法的有效性。所提出的对抗攻击方法可以达到最先进的性能。以DeepLabV3-Res50作为源模型时，本文所提出的图像分割的对抗攻击方使得在DeepLabV3-Res101上的对抗样本迁移性能相较于最先进的图像分割的对抗攻击方法SegPGD提高了约2.28%。

关键词: 对抗样本图像分割对抗迁移性两阶段黑盒模式

DOI：

投稿时间：2025-03-17修订日期：2025-09-22

基金项目:

Improving Transferability of Adversarial Examples on Semantic Segmentation

Xiaojun Jia¹, Simeng Qin², Teng Ma³, Guozhu Meng⁴, Yang Liu⁵, Xiaochun Cao³

(1.Nanyang Technological University;2.School of Business Administration, Northeastern University;3.School of Cyber Science and Technology, Sun Yat-sen University;4.Institute of Information Engineering, Chinese Academy of Sciences;5.School of Computer Science and Engineering, Nanyang Technological University)

Abstract:

Deep Neural Networks (DNNs) have achieved cutting-edge performance in many problems and tasks. However, recent research has shown that adversarial examples pose a significant threat to deep learning. Adversarial examples are specially designed data samples created by adding almost imperceptible minute perturbations to normal samples. They can deceive machine learning models into making incorrect judgments. Specifically, adversarial examples generated for a certain white-box model can fool other different black-box models, a phenomenon considered as the transferability of adversarial examples. Transferability of adversarial examples on image classification has been systematically explored, which generates adversarial examples in black-box mode. However, the transferability of adversarial examples on semantic segmentation has been largely overlooked. In this paper, we propose an effective two-stage adversarial attack strategy to improve the transferability of adversarial examples on semantic segmentation, dubbed TranSegPGD. Specifically, at the first stage, every pixel in an input image is divided into different branches based on its adversarial property. Different branches are assigned different weights for optimization to improve the adversarial performance of all pixels. We assign high weights to the loss of the hard-to-attack pixels to misclassify all pixels. At the second stage, the pixels are divided into different branches based on their transferable property which is dependent on Kullback-Leibler divergence. Different branches are assigned different weights for optimization to improve the transferability of the adversarial examples. We assign high weights to the loss of the high-transferability pixels to improve the transferability of adversarial examples. Extensive experiments with various segmentation models are conducted on PASCAL VOC 2012 and Cityscapes datasets to demonstrate the effectiveness of the proposed method. The proposed adversarial attack method can achieve state-of-the-art performance. When using DeepLabV3-Res50 as the source model, the proposed adversarial attack method for image segmentation improves the transfer performance of adversarial examples on DeepLabV3-Res101 by approximately 2.28% compared to the state-of-the-art adversarial attack method for image segmentation, SegPGD.

Key words: adversarial examples image segmentation adversarial transferability two-stage black box mode