语音对抗攻击与防御方法综述

徐东伟; 房若尘; 蒋斌; 宣琦

引用本文：

徐东伟,房若尘,蒋斌,宣琦.语音对抗攻击与防御方法综述[J].信息安全学报,2022,7(1):126-144 [点击复制]
XU Dongwei,FANG Ruochen,JIANG Bin,XUAN Qi.A Review of Speech Adversarial Attack and Defense Methods[J].Journal of Cyber Security,2022,7(1):126-144 [点击复制]

本文已被：浏览 18881次下载 16001次	码上扫一扫！
语音对抗攻击与防御方法综述
徐东伟^1,2, 房若尘^1,2, 蒋斌^1,2, 宣琦^1,2
0 字体:加大+\|默认\|缩小-
(1.浙江工业大学网络安全研究院杭州 310023;2.浙江工业大学信息工程学院杭州 310023)

摘要:

人工智能的不断发展，使得人与机器的交互变得至关重要。语音是人与智能通讯设备之间通信的重要手段，在近几年飞速发展，说话人识别、情感识别、语音识别得到广泛地普及与应用。特别的，随着深度学习的兴起，基于深度学习的语音技术使机器理解语音内容、识别说话人方面达到近似人的水平，无论是效率还是准确度都得到了前所未有的提升。例如手机语音助手、利用语音控制智能家电、银行业务，以及来远程验证用户防止诈骗等。但是正是因为语音的广泛普及，它的安全问题受到了公众的关注，研究表明，用于语音任务的深度神经网络（Deep neural network，DNN）容易受到对抗性攻击。即攻击者可以通过向原始语音中添加难以察觉的扰动，欺骗DNN模型，生成的对抗样本人耳听不出区别，但是会被模型预测错误，这种现象最初出现在视觉领域，目前引起了音频领域的研究兴趣。基于此，本文对近年来语音领域的对抗攻击、防御方法相关的研究和文献进行了详细地总结。首先我们按照应用场景对语音任务进行了划分，介绍了主流任务及其发展背景。其次我们解释了语音对抗攻击的定义，并根据其应用场景对数字攻击与物理攻击分别进行了介绍。然后我们又按照对抗防御，对抗检测的划分总结了语音对抗样本的防御方法。最后我们对于该领域的不足、前景、以及发展方向进行了探讨。

关键词: 深度神经网络语音识别对抗攻击对抗防御人工智能安全

DOI：10.19363/J.cnki.cn10-1380/tn.2022.01.09

投稿时间：2021-07-06修订日期：2021-10-27

基金项目:本课题得到国家自然科学基金（No.61903334），浙江省自然科学基金（No.LY21F030016）资助

A Review of Speech Adversarial Attack and Defense Methods

XU Dongwei^1,2, FANG Ruochen^1,2, JIANG Bin^1,2, XUAN Qi^1,2

(1.Institute of Cyberspace Security, Zhejiang University of Technology, Hangzhou 310023, China;2.College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China)

Abstract:

With the development of artificial intelligence, the interaction between humans and machines has become more and more important. Speech is an important tool for communication between humans and smart communication devices, and has developed rapidly in recent years. Speaker recognition, emotion recognition, and speech recognition have been widely popularized and applied. In particular, with the rapid development of deep learning technology, speech technology that bases on deep learning enables machines to understand the content of speech and recognize the speaker at a level similar to that of humans. Both efficiency and accuracy have been unprecedentedly improved. For example, mobile phone speech assistant uses speech to control smart home appliances and banking, it can also be used to remotely verify user identity to prevent fraud, etc. But because of the widespread popularity of speech, its security issues have attracted public attention. Researches show that Deep Neural Network (DNN) for speech tasks is vulnerable to adversarial attacks. That is, the attacker can deceive the DNN model by adding imperceptible disturbances to original speech. The generated adversarial samples are indistinguishable by human ears, but they will be predicted by the model incorrectly. This phenomenon first appeared in the visual field, and now it has aroused research interest in the speech field. Based on this, this paper summarizes the research and literature related to adversarial attacks and defense methods in the speech field in recent years. First the speech tasks are divided according to application scenarios, we introduce the mainstream tasks and their general development background. Then we explain the definition of speech adversarial attacks and introduce digital attacks and physical attacks according to speech application scenarios. Later, for the defense methods of speech adversarial samples, we classify them into adversarial defense and adversarial detection, we introduce them separately. Finally, we further discuss the possible deficiencies, future prospects, and development directions of this research field.

Key words: deep neural network speech recognition adversarial attack adversarial defense artificial intelligence security