基于模型操作的单参数后门攻击

段秋宇; 候琳珊; 花忠云; 廖清; 张玉书; 张瑜

本文已被：浏览 715次下载 176次	码上扫一扫！
基于模型操作的单参数后门攻击
段秋宇,候琳珊,花忠云,廖清,张玉书,张瑜
分享到：微信更多字体:加大+\|默认\|缩小-
(哈尔滨工业大学(深圳) 计算机科学与技术学院深圳中国 518055;南京航空航天大学计算机科学与技术学院南京中国 210016;格里菲斯大学信息与通信技术学院昆士兰州南港澳大利亚 4215)

摘要:

随着后门攻击对深度神经网络的危害性得以证实,学术界开始深入探究现实可行性。目前的后门攻击方法多通过投毒训练数据来植入后门。具体来说,这通常涉及将恶意设计的样本引入训练数据集中,使模型学习到错误的关联,从而在推理阶段被攻击者利用。这些方法虽然有效,但涉及的攻击链路较长,攻击场景单一,现实可行性有限。为了提高后门攻击的现实可行性,基于模型操作的后门攻击方法被提出。这类方法通过直接操作模型参数来植入后门,攻击链路短,攻击场景多,一定程度上提高了后门攻击的现实可行性。然而,现有的基于模型操作的后门攻击方法存在实施过程烦琐耗时,以及对参数修改量的限制导致攻击有效性受限的问题。为了解决这些问题,提出了一种基于模型操作的单参数后门攻击方法。在该方法中,攻击者仅需小幅度调整模型中与目标类别对应的输出神经元的偏置参数,便能有效地植入后门。这一实施过程不仅简单迅速,且只需要修改单个模型参数,具有极高的攻击隐蔽性。此外,通过最大化模型预测不确定性生成的触发器保证了该方法的有效性。大量的实验结果表明,与现有的基于模型操作的后门攻击方法相比,单参数后门攻击拥有更好的有效性和隐蔽性。

关键词: 深度学习深度神经网络人工智能安全后门攻击

DOI：10.19363/J.cnki.cn10-1380/tn.2026.01.02

投稿时间：2024-04-11修订日期：2024-08-16

基金项目:国家自然科学基金(No.62572150); 深圳市科技计划(No.KJZD20230923114806014,No.JCYJ20230807094411024)资助。

Single-Parameter Backdoor Attack Based on Model Manipulation

DUAN Qiuyu,HOU Linshan,HUA Zhongyun,LIAO Qing,ZHANG Yushu,ZHANG Yu

School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China;College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China;School of Information and Communication Technology, Griffith University, Southport, QLD 4215, Australia

Abstract:

With the increasing recognition of the detrimental impact of backdoor attacks on deep neural networks, the academic community has begun to explore their real-world feasibility in depth. Currently, most backdoor attack methods involve poisoning the training data to implant backdoors. This typically involves introducing maliciously crafted samples into the training dataset, which causes the model to learn incorrect associations that can be exploited in the inference stage. While these methods are effective, they involve long attack chains and are limited to specific attack scenarios, which reduces their practical applicability. To enhance the real-world feasibility of backdoor attacks, methods based on model manipulation have been proposed. These methods implant backdoors by directly manipulating the model parameters, which shorten the attack chain and diversify applicable scenarios, thus improving the feasibility of backdoor attacks to a certain extent. However, existing model manipulation-based backdoor attack methods have significant drawbacks. The implementation process is often cumbersome and time-consuming. Additionally, the limitations on parameter modification quantity restrict the effectiveness of the attack. To address these issues, we proposed a single-parameter backdoor attack method based on model manipulation. In this method, the attacker only needs to make a slight adjustment to the bias parameter of the output neuron corresponding to the target class to effectively implant a backdoor. This process is not only simple and swift but also requires modification of only a single model parameter, thereby offering high attack stealthiness. Furthermore, the effectiveness of this method is ensured by maximizing the model's prediction uncertainty to generate the trigger. Extensive experimental results demonstrate that, compared to existing model manipulation-based backdoor attack methods, the single-parameter backdoor attack exhibits superior effectiveness and stealthiness. This method presents a significant advancement in the practical applicability of backdoor attacks on deep neural networks, offering a more streamlined and efficient approach to compromising model integrity.

Key words: deep learning deep neural network artificial intelligence security backdoor attack