语音识别系统对抗样本攻击及防御综述

台建玮; 李亚凯; 贾晓启; 黄庆佳

引用本文：

台建玮,李亚凯,贾晓启,黄庆佳.语音识别系统对抗样本攻击及防御综述[J].信息安全学报,2022,7(5):51-64 [点击复制]
TAI Jianwei,LI Yakai,JIA Xiaoqi,HUANG Qingjia.A Survey: Attacks and Countermeasures of Adversarial Examples for Speech Recognition System[J].Journal of Cyber Security,2022,7(5):51-64 [点击复制]

本文已被：浏览 5981次下载 5472次	码上扫一扫！
语音识别系统对抗样本攻击及防御综述
台建玮^1,2, 李亚凯^1,2, 贾晓启^1,2, 黄庆佳^1,2
0 字体:加大+\|默认\|缩小-
(1.中国科学院信息工程研究所北京中国 100093;2.中国科学院大学网络空间安全学院北京中国 100049)

摘要:

语音是人类与智能手机或智能家电等现代智能设备进行通信的一种常用而有效的方式。随着计算机和网络技术的显著进步，语音识别系统得到了广泛的应用，它可以将用户发出的语音指令解释为智能设备上可以理解的数字指令或信号，实现用户与这些设备的远程交互功能。近年来，深度学习技术的进步推动了语音识别系统发展，使得语音识别系统的精度和可用性不断提高。然而深度学习技术自身还存在未解决的安全性问题，例如对抗样本。对抗样本是指在模型的预测阶段，通过对预测样本添加细微的扰动，使模型以高置信度给出一个错误的目标类别输出。目前对于对抗样本的攻击及防御研究主要集中在计算机视觉领域而忽略了语音识别系统模型的安全问题，当今最先进的语音识别系统由于采用深度学习技术也面临着对抗样本攻击带来的巨大安全威胁。针对语音识别系统模型同样面临对抗样本的风险，本文对语音识别系统的对抗样本攻击和防御提供了一个系统的综述。我们概述了不同类型语音对抗样本攻击的基本原理并对目前最先进的语音对抗样本生成方法进行了全面的比较和讨论。同时，为了构建更安全的语音识别系统，我们讨论了现有语音对抗样本的防御策略并展望了该领域未来的研究方向。

关键词: 语音识别系统语音对抗样本防御策略深度学习

DOI：10.19363/J.cnki.cn10-1380/tn.2022.09.05

投稿时间：2019-12-31修订日期：2020-03-06

基金项目:本课题得到中国科学院网络测评技术重点实验室资助项目，网络安全防护技术北京市重点实验室资助项目，北京市科技计划课题（No.Z191100007119010），国家自然科学基金（No.61772078）资助。

A Survey: Attacks and Countermeasures of Adversarial Examples for Speech Recognition System

TAI Jianwei^1,2, LI Yakai^1,2, JIA Xiaoqi^1,2, HUANG Qingjia^1,2

(1.Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;2.School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China)

Abstract:

Voice is a common and effective way of communication between human and modern intelligent devices such as smartphones or smart household appliances. With the significant progress of computer and network technology, the speech recognition system has been widely used. It can interpret the voice instructions sent by users into understandable digital instructions or signals on intelligent devices and realize the remote interaction between users and intelligent devices. In recent years, the impressive achievements of deep learning technology promote the development of the speech recognition system, which makes the accuracy and availability of speech recognition system improve continuously. However, deep learning technology itself still has unsolved security problems, such as adversarial examples. Adversarial samples refer to adding subtle perturbations to the predicted samples in the prediction stage of the model, make the model gives a wrong target category classify output with high confidence. The current researches on the attack and defense of adversarial samples mainly focuses on the field of computer vision and ignoring the security issues of the speech recognition systems model. At present, the most advanced speech recognition system also faces a huge security threat brought by adversarial examples attack due to the use of deep learning technology. In response to the same risk of adversarial samples faced in the field of speech recognition system, this paper provides a systematic overview of attacks and countermeasures of adversarial examples for the speech recognition system. First of all, we summarize the basic attack principles of different types of speech adversarial examples. In addition, we discuss the advantages and disadvantages of these methods, through a comprehensive comparison of the most advanced generation methods of speech adversarial examples. Last but not least, in order to build a more secure speech recognition system, we discuss the defense countermeasures for the existing speech adversarial examples and look forward to the future research direction in this field.

Key words: speech recognition system adversarial examples defense strategies deep learning