摘要: |
随着深度学习研究与应用的迅速发展,人工智能安全问题日益突出。近年来,深度学习模型的脆弱性和不鲁棒性被不断的揭示,针对深度学习模型的攻击方法层出不穷,而后门攻击就是其中一类新的攻击范式。与对抗样本和数据投毒不同,后门攻击者在模型的训练数据中添加触发器并改变对应的标签为目标类别。深度学习模型在中毒数据集上训练后就被植入了可由触发器激活的后门,使得模型对于正常输入仍可保持高精度的工作,而当输入具有触发器时,模型将按照攻击者所指定的目标类别输出。在这种新的攻击场景和设置下,深度学习模型表现出了极大的脆弱性,这对人工智能领域产生了极大的安全威胁,后门攻击也成为了一个热门研究方向。因此,为了更好的提高深度学习模型对于后门攻击的安全性,本文针对深度学习中的后门攻击方法进行了全面的分析。首先分析了后门攻击和其他攻击范式的区别,定义了基本的攻击方法和流程,然后对后门攻击的敌手模型、评估指标、攻击设置等方面进行了总结。接着,将现有的攻击方法从可见性、触发器类型、标签类型以及攻击场景等多个维度进行分类,包含了计算机视觉和自然语言处理在内的多个领域。此外,还总结了后门攻击研究中常用的任务、数据集与深度学习模型,并介绍了后门攻击在数据隐私、模型保护以及模型水印等方面的有益应用,最后对未来的关键研究方向进行了展望。 |
关键词: 后门攻击 人工智能安全 深度学习 |
DOI:10.19363/J.cnki.cn10-1380/tn.2022.05.01 |
投稿时间:2021-03-09修订日期:2021-04-30 |
基金项目:本课题得到国家自然科学基金项目(No.61772337)资助。 |
|
A Survey of Backdoor Attack in Deep Learning |
DU Wei,LIU Gongshen |
School of Cyber Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China |
Abstract: |
With the rapid development of deep learning research and applications, artificial intelligence security issues become increasingly more important. In recent years, the vulnerability and non-robustness of deep learning models are continuously revealed. Numerous attack methods against deep learning models have emerged, and the backdoor attack is one of the new attack paradigms. Different from adversarial examples and data poisoning, backdoor attackers add triggers to the training data of the model and change the corresponding labels to the target class. Deep learning models are implanted with backdoors that can be activated by the triggers once they are trained on poisoned datasets. The poisoned model can still keep high precision for the normal samples while the model will output according to the target class specified by the attacker when the inputs have triggers. With this new attack scenario and setup, deep learning models show great vulnerability, which creates a great security threat to the field of artificial intelligence. Backdoor attacks have also become a popular research area. Therefore, in order to better improve the security of deep learning models for backdoor attacks, this paper presents a comprehensive analysis of the existing backdoor attack methods in deep learning. First, we analyze the differences between backdoor attacks and other attack paradigms and define the basic backdoor attack methods and processes. Then we summarize the adversary models, evaluation metrics, and attack settings for backdoor attacks. Then, the existing attack methods are classified in multiple dimensions such as visibility, trigger types, label types, and attack scenarios, encompassing various domains including computer vision and natural language processing. In addition, the tasks, datasets and deep learning models commonly used in backdoor attack research are summarized, and useful applications of backdoor attacks in data privacy, model protection, and model watermarking are presented. Finally, the key research directions in the future are prospected. |
Key words: backdoor attack artificial intelligence security deep learning |