引用本文
  • 彭雨璠,葛云洁,黄嘉兴,刘昱忱,赵令辰.针对智能语音系统的欺骗性数据投毒攻击[J].信息安全学报,已采用    [点击复制]
  • Peng Yufan,Ge Yunjie,Huang Jiaxing,Liu Yuchen,Zhao Lingchen.Delusive Data Poisoning Attacks against Intelligence Audio Systems[J].Journal of Cyber Security,Accept   [点击复制]
【打印本页】 【下载PDF全文】 查看/发表评论下载PDF阅读器关闭

过刊浏览    高级检索

本文已被:浏览 95次   下载 0  
针对智能语音系统的欺骗性数据投毒攻击
0
(1.武汉大学;2.武汉数学与智能研究院)
摘要:
随着智能语音系统(如声纹识别、语音命令识别)的广泛应用,其安全性问题日益凸显。现有研究主要集中于测试阶段的对抗样本攻击,而对训练阶段的数据投毒威胁缺乏系统性研究。更为关键的是,当前工作大多采用端到端的黑盒分析视角,未能深入剖析语音系统内部组件的安全特性。本文通过实证研究发现,声学特征提取器作为语音处理流水线的核心组件,其特定的信号处理机制可能引入新的攻击面。具体而言,本文首次揭示了声学特征提取过程中存在的"人机感知差异"现象:传统特征提取算法(如梅尔频率倒谱系数、滤波器组)的频域分析与人类听觉感知特性之间存在不一致的现象。基于这一发现,我们提出了一种新型数据投毒攻击方法SpecTox,通过添加轻微噪声,SpecTox利用特征空间的非线性特性,使微小扰动可能导致显著的特征偏移。 通过使用无关音频样本的特征向量进行校准,让新特征向量与原始特征向量的差异扩大,进而成功混淆模型需要学习的特征。我们通过攻击三类代表性智能音频系统(说话人识别系统(Speaker identification,SI),语音命令识别系统(Speech command recognition,SCR))和自动语音识别(Automatic Speech Recognition, ASR)验证了SpecTox的有效性。实验表明,当仅污染1%的训练数据时,VggVox模型的准确率下降13.2%。
关键词:  数据投毒攻击  干净标签攻击  智能音频系统
DOI:
投稿时间:2025-06-30修订日期:2025-11-10
基金项目:国家自然科学基金项目(面上项目,重点项目,重大项目)
Delusive Data Poisoning Attacks against Intelligence Audio Systems
Peng Yufan1, Ge Yunjie2,3, Huang Jiaxing1, Liu Yuchen1, Zhao Lingchen1
(1.Wuhan University;2.Institute for Math &3.AI)
Abstract:
With the widespread application of intelligent speech systems (such as voiceprint recognition and voice command recognition), their security issues have become increasingly prominent. Existing research mainly focuses on adver-sarial sample attacks during the testing phase, while there is a lack of systematic research on data poisoning threats during the training phase. More critically, most current work adopts an end-to-end black-box analysis perspective and fails to deeply analyze the security characteristics of internal components of speech systems. Through empirical research, this paper finds that the acoustic feature extractor, as the core component of the speech processing pipeline, may introduce new attack surfaces due to its specific signal processing mechanisms. Specifically, this paper reveals for the first time the phenomenon of "human-machine perception difference" in the acoustic feature extraction pro-cess: the frequency domain analysis of traditional feature extraction algorithms (such as Mel-frequency cepstral co-efficients (MFCC) and Filter banks (FBANK)) is inconsistent with human auditory perception characteristics. Based on this finding, we propose a new data poisoning attack method called SpecTox. By adding slight noise, SpecTox leverages the nonlinear characteristics of the feature space, making small perturbations likely to cause significant feature shifts. By calibrating with feature vectors of irrelevant audio samples, the new feature vectors differ greatly from the original ones, thereby successfully confusing the features that the model needs to learn. We verify the effec-tiveness of SpecTox by attacking three representative intelligent audio systems: speaker identification (SI) systems, speech command recognition (SCR) systems and automatic speech recognition (ASR). Experiments show that when only 1% of the training data is poisoned, the accuracy of the VggVox model drops by 13.2%.
Key words:  Data poisoning attacks  clean-label attacks  intelligence audio systems