深度神经网络后门防御综述

江钦辉; 李默涵; 孙彦斌

引用本文：

江钦辉,李默涵,孙彦斌.深度神经网络后门防御综述[J].信息安全学报,2024,9(4):47-63 [点击复制]
JIANG Qinhui,LI Mohan,SUN Yanbin.A Survey on Defense against Deep Neural Network Backdoor Attack[J].Journal of Cyber Security,2024,9(4):47-63 [点击复制]

本文已被：浏览 13174次下载 8304次	码上扫一扫！
深度神经网络后门防御综述
江钦辉, 李默涵, 孙彦斌
0 字体:加大+\|默认\|缩小-
(广州大学网络空间安全学院/网络空间先进技术研究院广州中国 510006)

摘要:

深度学习在各领域全面应用的同时, 在其训练阶段和推理阶段也面临着诸多安全威胁。神经网络后门攻击是一类典型的面向深度学习的攻击方式, 攻击者通过在训练阶段采用数据投毒、模型编辑或迁移学习等手段, 向深度神经网络模型中植入非法后门, 使得后门触发器在推理阶段出现时, 模型输出会按照攻击者的意图偏斜。这类攻击赋予攻击者在一定条件下操控模型输出的能力, 具有极强的隐蔽性和破坏性。因此, 有效防御神经网络后门攻击是保证智能化服务安全的重要任务之一, 也是智能化算法对抗研究的重要问题之一。本文从计算机视觉领域出发, 综述了面向深度神经网络后门攻击的防御技术。首先, 对神经网络后门攻击和防御的基础概念进行阐述, 分析了神经网络后门攻击的三种策略以及建立后门防御机制的阶段和位置。然后,根据防御机制建立的不同阶段或位置, 将目前典型的后门防御方法分为数据集级、模型级、输入级和可认证鲁棒性防御四类。每一类方法进行了详细的分析和总结, 分析了各类方法的适用场景、建立阶段和研究现状。同时, 从防御的原理、手段和场景等角度对每一类涉及到的具体防御方法进行了综合比较。最后, 在上述分析的基础上, 从针对新型后门攻击的防御方法、其他领域后门防御方法、更通用的后门防御方法、和防御评价基准等角度对后门防御的未来研究方向进行了展望。

关键词: 后门防御后门攻击人工智能安全神经网络深度学习

DOI：10.19363/J.cnki.cn10-1380/tn.2024.07.03

投稿时间：2022-08-09修订日期：2022-11-22

基金项目:本课题得到国家自然科学基金(No. 62372126,No. 62072130)、广东省自然科学基金面上项目(No. 2021A1515012307,No. 2020A1515010450)、广州市科技计划一般项目(No. 202102021207,No. 202102020867)、广东省高校创新团队项目(No. 2020KCXTD007)、广州市高校创新团队项目(No. 202032854)、广东省珠江学者岗位计划(2019)资助。

A Survey on Defense against Deep Neural Network Backdoor Attack

JIANG Qinhui, LI Mohan, SUN Yanbin

(Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China)

Abstract:

While deep learning is widely applied in various applications, it also faces many security threats in its training and inference phases. The neural network backdoor attack is a typical type of deep learning-oriented attack. An attacker can implant an illegal backdoor into deep neural network model during the training phase by employing techniques such as data poisoning, model editing or transfer learning. When the corresponding backdoor trigger appears in the inference phase, the attacked model will give the wrong output according to the attacker's intention. This kind of attack endows the attacker with the ability to control the output of the model through the backdoor trigger, which is highly concealed and destructive. Therefore, effective defense against neural network backdoor attacks is one of the important tasks to ensure the security of intelligent services, and it is also one of the important issues of intelligent algorithm confrontation. In this paper, the defense techniques for deep neural network backdoor attacks are reviewed from the field of computer vision. First, the basic concepts of neural network backdoor attack and defense are explained. The main attack methods are summarized into three categories, and the reasonable positions, advantages and disadvantages of the corresponding defense mechanisms are outlined. Then, according to the different stages of the defense mechanism, the current typical backdoor defense methods are divided into four categories: dataset-level, model-level, input-level, and certifiable robust defense. The methods of each category are analyzed and summarized in detail according to their applicable scenarios, stages and research status. At the same time, a comprehensive comparison of the specific defense methods involved in each category is made from the perspectives of defense principles, means and scenarios. Finally, on the basis of the above analysis, the future research directions of backdoor defense are prospected from the perspectives of defense methods against new backdoor attacks, backdoor defense methods in other fields, more general backdoor defense methods, and defense evaluation benchmarks.

Key words: backdoor defense backdoor attack artificial intelligence security neural network deep learning