文本中的对抗攻击与防御综述

梁宇航; 林政; 王雷; 何原野; 王伟平

本文已被：浏览 1281次下载 623次	码上扫一扫！
文本中的对抗攻击与防御综述
梁宇航,林政,王雷,何原野,王伟平
分享到：微信更多字体:加大+\|默认\|缩小-
(中国科学院信息工程研究所北京中国 100093;中国科学院大学网络空间安全学院北京中国 100049)

摘要:

随着人工智能技术与高性能计算设备的发展,近些年深度神经网络(Deep Neural Networks,DNNs)在计算机视觉、语音识别和自然语言处理等领域取得了非常不错的成就。在图像分类、语音识别和文本分类等任务中,DNNs的准确率甚至超越了人类,为人工智能赋予了惊人的能力。然而,近几年的研究表明,DNN模型非常容易受到对抗样本的攻击,只需在正常输入中加入微小不可察觉的扰动,就能导致DNN模型错误的预测,这一现象引发了人们对模型鲁棒性的深刻思考。在计算机视觉领域中对抗攻击与防御已经得到了广泛的研究,研究者们努力开发鲁棒的模型来抵御各种攻击,但在文本领域中的研究还有些不够,很多视觉领域的方法并不能直接应用于文本,尤其是文本离散的特点使得攻击和防御更有挑战性,也有更多的研究空间。本文全面介绍了文本领域中的对抗攻击与防御以及一些相关工作。具体来说,本文首先从不同的角度对文本中的对抗攻击与防御进行了分类,以便更好地理解和研究不同类型的攻击与防御,接着本文介绍了相应的工作和最新进展,包括不同攻击与防御方法的性能比较以及实验结果的分析,最后本文讨论了文本领域对抗攻击与防御存在的挑战,如何在保持语义准确性的前提下生成质量更高的对抗样本以及如何避免模型只学习浅层特征。这些挑战为研究者提供了宝贵的探索机会,未来的研究可以集中在这些方向以推动文本领域对抗攻击与防御的发展。

关键词: 深度神经网络对抗样本对抗攻击与防御

DOI：10.19363/J.cnki.cn10-1380/tn.2023.08.11

投稿时间：2021-01-29修订日期：2021-04-26

基金项目:本课题得到国家自然科学基金项目(No. 61976207, No. 61906187)资助。

A Survey on Adversarial Attacks and Defenses in Text Domain

LIANG Yuhang,LIN Zheng,WANG Lei,HE Yuanye,WANG Weiping

Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China

Abstract:

With the development of artificial intelligence and high computational devices, deep neural networks (DNNs), in recent years, have achieved remarkable results in fields such as computer vision, speech recognition, and natural language processing. In tasks such as image classification, speech recognition, and text classification, the accuracy rate of DNNs has even surpassed that of humans, endowing artificial intelligence with amazing capabilities. However, researches in recent years have shown that DNNs are highly vulnerable to adversarial examples which can lead to incorrect predictions by adding small and imperceptible perturbations to the normal inputs. This phenomenon has triggered people’s deep thinking about the robustness of DNN models. The adversarial attacks and defenses have been well studied in the field of computer vision, and researchers strive to develop robust models against various attacks, but researches in text domain are still relatively insufficient. Many methods in computer vision domain cannot be directly applied to texts. Especially the input space of texts is discrete which makes attacks and defenses more challenging. So there is still lots of research potentials in this field. This article presents a comprehensive introduction of adversarial attacks and defenses in text domain together with some related work. Specifically, we first classify the adversarial attacks and defenses in texts from different perspectives so that we can better understand and study different types of attacks and defenses. Then we present the corresponding related works and recent advances, including the performance comparison of different attack and defense methods and the analysis of experimental results. Finally we discuss the existing challenges of adversarial attacks and defenses in text domain such as how to generate higher-quality adversarial examples while maintaining semantic similarity and how to avoid the model from only learning shallow features. These challenges provide researchers with valuable exploration opportunities, and future research can focus on these directions to promote the development of adversarial attacks and defenses in text domain.

Key words: deep neural networks adversarial examples adversarial attacks and defenses