语音对抗样本的攻击与防御综述

魏春雨; 孙蒙; 邹霞; 张雄伟

引用本文：

魏春雨,孙蒙,邹霞,张雄伟.语音对抗样本的攻击与防御综述[J].信息安全学报,2022,7(1):100-113 [点击复制]
WEI Chunyu,SUN Meng,ZOU Xia,ZHANG Xiongwei.Reviews on the Attack and Defense Methods of Voice Adversarial Examples[J].Journal of Cyber Security,2022,7(1):100-113 [点击复制]

本文已被：浏览 15901次下载 11234次	码上扫一扫！
语音对抗样本的攻击与防御综述
魏春雨, 孙蒙, 邹霞, 张雄伟
0 字体:加大+\|默认\|缩小-
(陆军工程大学指挥控制工程学院智能信息处理实验室南京中国 210007)

摘要:

语音是人机交互的重要载体，语音中既包含语义信息，还包含性别、年龄、情感等附属信息。深度学习的发展使得各类语音处理任务的性能得到了显著提升，智能语音处理的产品已应用于移动终端、车载设备以及智能家居等场景。语音信息被准确地识别是人与设备实现可信交互的重要基础，语音传递过程中的安全问题也受到了广泛关注。对抗样本攻击是最近几年兴起的一个研究热点，攻击者通过对样本进行微小的改动使深度学习模型预测错误，从而带来潜在的安全风险。语音识别领域同样面临着来自对抗样本的安全威胁，在对抗样本的攻击和防御方法上也与图像识别等领域存在显著差异。因此，研究语音对抗样本的攻击和防御方法具有重要意义。本文在介绍对抗样本相关概念的基础上，选取语音识别中的文本内容识别、声纹身份识别两个典型任务，按照从白盒攻击到黑盒攻击、从数字攻击到物理攻击、从特定载体到通用载体的顺序，采取从易到难、逐步贴近实际场景的方式，系统地梳理了近年来比较典型的语音对抗样本的攻击方法。从分类边界构造的角度，对语音对抗样本的防御方法进行分类论述，揭示各类方法实现防御的机理。对现阶段语音对抗样本攻击与防御方法的技术难点进行了分析与总结，并对语音对抗样本攻防未来的发展方向进行了展望。

关键词: 对抗样本语音识别声纹识别攻击防御

DOI：10.19363/J.cnki.cn10-1380/tn.2022.01.07

投稿时间：2021-07-10修订日期：2021-10-11

基金项目:本课题得到江苏省优秀青年基金（No.BK20180080）和国家自然科学基金（No.62071484）资助。

Reviews on the Attack and Defense Methods of Voice Adversarial Examples

WEI Chunyu, SUN Meng, ZOU Xia, ZHANG Xiongwei

(Lab of Intelligent Information Processing, College of Command and Control Engineering, Army Engineering University, Nanjing 210007, China)

Abstract:

Speech plays an important role in human-computer communications. It contains not only textual and semantic information, but also has additional information on the gender, age and emotion of the speaker. The development of deep neural network has significantly improved the performance of miscellaneous tasks on speech processing. Therefore, products based on intelligent speech processing using deep learning have been applied to mobile terminals, vehicle-mounted devices, smart home and so on. Accurate recognition of speech is an important basis for trusted interaction between human and device, so the security issues involved in speech transmission has attracted a lot of research. Fooling deep learning models using adversarial examples is a hot research topic in recent years. The attacker can mislead a deep neural network by just making slight changes to a data example, which brings potential security risks to the application of deep learning model. Voice recognition is also faced with security threats from adversarial examples, but there are significant differences from other fields (e.g., image recognition) in the methods of attack and defense using adversarial examples. Therefore, it is of great significance to study the attack and defense methods of voice adversarial examples. In this paper, based on the introduction of related concepts of adversarial examples, by taking automatic speech-to-text recognition and speaker recognition as two typical tasks, we summarized the typical attack methods of adversarial examples for voice recognition systems in recent years by following the ways from white-box to black-box, from digital attacks to physical attacks and from specific carrier voice to universal carrier voices. Furthermore, in the view of the configuration of classifiers boundaries, we categorized the defense methods proposed recently and investigated how those methods work. Finally, we summarized the technical difficulties of the attack and defense methods of adversarial examples for voice recognition at present, and the future directions of the attack and defense of adversarial examples for voice recognition were predicted.

Key words: adversarial examples speech recognition speaker recognition attack defense