【打印本页】      【下载PDF全文】   查看/发表评论  下载PDF阅读器  关闭
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 37次   下载 19 本文二维码信息
码上扫一扫!
面向真实诈骗通信场景的伪造语音检测研究
徐哲,程鹏,巴钟杰,黄鹏,任奎
分享到: 微信 更多
(南京理工大学网络空间安全学院 江苏 南京 210094;浙江大学区块链与数据安全全国重点实验室 浙江 杭州 310027;浙江大学区块链与数据安全全国重点实验室 浙江 杭州 310027;浙江大学网络空间安全学院 浙江 杭州 310027;浙江大学计算机科学与技术学院 浙江 杭州 310027;南京理工大学网络空间安全学院 江苏 南京 210094;浙江大学区块链与数据安全全国重点实验室 浙江 杭州 310027;浙江大学网络空间安全学院 浙江 杭州 310027;浙江大学计算机科学与技术学院 浙江 杭州 310027)
摘要:
近年来,音频伪造检测研究主要关注提升模型对未知伪造算法的泛化能力,通常通过优化特征提取结构,并引入随机噪声、频率扰动等数据增强策略来提高鲁棒性。然而,在真实应用场景中,尤其是电信诈骗通信链路中,压缩编码、网络抖动和终端设备差异等非线性失真会显著干扰检测效果。现有研究中,数据增强方法在复现此类复杂失真方面的有效性尚不明确,其在真实场景中提升检测效果的能力仍存疑问。此外,在公开领域,针对此类复杂失真场景的伪造语音数据集及系统性评估体系也普遍缺乏。为填补这一空白,本文设计并实现了一套软硬件结合的真实语音欺诈模拟系统,能够高保真还原电话与社交软件(如微信)通话过程中的通信失真过程。据此构建了一个真实语音通信链路条件下的大规模伪造语音数据集1。在此基础上,本文系统评估了主流伪造检测模型与典型增强策略在链路失真条件下的性能表现。实验结果表明,虽然数据增强对真实链路数据的检测效果有所提升,但相较于在原始数据上的表现,模型在经过真实链路传输后的表现仍显著下降。在引入本文构建的真实链路失真样本进行训练后,模型性能得到显著提升:电话链路上的等错误率(EER)由19.16%降至6.48%,微信链路上的等错误率也由15.76%降至7.72%。进一步地,依托RealLink数据集,我们提出并验证了一种基于表征的抗失真检测方法,通过轻量级恢复模块对受损特征进行修正,能够一定程度上恢复被链路失真掩盖的伪造痕迹,并带来额外的性能提升。本研究揭示了通信链路失真对伪造语音检测系统性能的关键影响,构建了可复现并且开源的高保真链路失真数据集,并为真实语音欺诈场景下的鲁棒伪造音频检测系统的设计与评估提供了数据支持与方法参考。
关键词:  音频伪造检测  通信链路失真  电信诈骗  伪造语音数据集
DOI:10.19363/J.cnki.cn10-1380/tn.2025.11.14
投稿时间:2025-06-30修订日期:2025-11-11
基金项目:本课题得到国家自然科学基金面上项目(No.62472372,No.62172359),浙江省重大项目(No.LD24F020010),浙江省“尖兵领雁+X”科技计划项目资助。
Spoofed Speech Detection in Real-World Fraudulent Communication Scenarios
XU Zhe,CHENG Peng,BA Zhongjie,HUANG Peng,REN Kui
School of Cyberspace Security, Nanjing University of Science and Technology, Nanjing 210094, China;State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou 310027, China;State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou 310027, China;School of Cyberspace Security, Zhejiang University, Hangzhou 310027, China;College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;School of Cyberspace Security, Nanjing University of Science and Technology, Nanjing 210094, China;State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou 310027, China;School of Cyberspace Security, Zhejiang University, Hangzhou 310027, China;College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
Abstract:
In recent years, research on audio spoofing detection has primarily focused on improving the generalization ability of models against unknown spoofing algorithms. Common approaches include optimizing feature extraction architectures and introducing data augmentation strategies such as random noise and frequency perturbations to enhance model robustness. However, in real-world applications—particularly in telecommunication fraud scenarios—nonlinear distortions such as codec compression, network jitter, and device variability can significantly degrade detection performance. The effectiveness of existing data augmentation methods in reproducing such complex distortions remains uncertain, raising doubts about their ability to improve detection under real conditions. Moreover, there is a general lack of publicly avail-able spoofed speech datasets and systematic evaluation protocols that reflect these realistic distortion scenarios. To ad-dress this gap, this paper proposes a hardware-software integrated simulation system capable of faithfully reproducing the distortion process in real-world voice communication channels, including telephone calls and social messaging plat-forms such as WeChat. Based on this system, we construct a large-scale spoofed speech dataset under realistic commu-nication link conditions. On this foundation, we systematically evaluate several state-of-the-art spoofing detection mod-els and representative data augmentation strategies under transmission-induced distortions. Experimental results show that although data augmentation provides some improvement on real-link data, model performance still drops signi fi-cantly compared to clean conditions. After incorporating the proposed real-link distorted samples into training, the mod-el performance was significantly improved: the Equal Error Rate (EER) on the telephone link decreased from 19.16% to 6.48%, and the EER on the WeChat link decreased from 15.76% to 7.72%. Furthermore, based on the RealLink dataset, we propose and validate a representation-based anti-distortion detection method, in which a lightweight restoration module is employed to refine the degraded features, thereby partially recovering spoofing cues obscured by channel dis-tortions and yielding additional performance gains. This study highlights the critical impact of communication link dis-tortions on spoofing detection performance, presents a reproducible and open-source high-fidelity distortion dataset, and offers both data support and methodological insights for building robust detection systems in real-world voice fraud scenarios.
Key words:  audio spoofing detection  transmission distortion  telecom fraud  spoofed speech dataset