面向DeepFake伪造模型溯源的逃避攻击

吴梦洁; 于佳艺; 汪润; 叶茜; 张钰洋; 蔺琛皓; 方黎明; 王丽娜

引用本文：

吴梦洁,于佳艺,汪润,叶茜,张钰洋,蔺琛皓,方黎明,王丽娜.面向DeepFake伪造模型溯源的逃避攻击[J].信息安全学报,2026,11(2):80-94 [点击复制]
WU Mengjie,YU Jiayi,WANG Run,YE Xi,ZHANG Yuyang,LIN Chenhao,FANG Liming,WANG Lina.Evading Attacks for DeepFake Fake Model Traceability[J].Journal of Cyber Security,2026,11(2):80-94 [点击复制]

本文已被：浏览 25次下载 20次	码上扫一扫！
面向DeepFake伪造模型溯源的逃避攻击
吴梦洁¹, 于佳艺¹, 汪润¹, 叶茜¹, 张钰洋¹, 蔺琛皓², 方黎明³, 王丽娜¹
0 字体:加大+\|默认\|缩小-
(1.武汉大学国家网络安全学院空天信息安全与可信计算教育部重点实验室武汉中国 430072;2.西安交通大学西安中国 710049;3.南京航空航天大学南京中国 210016)

摘要:

近年来,深度伪造技术(DeepFake)的泛滥引起了公众和知名人士的极大警觉。这些高度逼真的伪造图像以及视频可能大规模传播虚假信息,对声誉造成伤害,甚至可能引发社会动荡。为了应对生成的伪造图像及视频,DeepFake取证领域的研究得到了广泛关注。在当前的DeepFake取证研究中,DeepFake检测技术负责判断给定样本真实与否,而DeepFake溯源技术则旨在追溯生成该类Deepakes的伪造模型类型,为DeepFake检测提供更具解释性的结果。具体而言,DeepFake溯源可以分为模型-架构溯源和模型-实例溯源两类,其中模型-架构溯源仅推断使用的具体模型架构,而模型-实例溯源则试图识别具有特定训练设置的模型实例。而无论模型-架构溯源还是模型-实例溯源方法,都依赖于识别DeepFake生成过程中留下的特定痕迹,精明的攻击者可以破坏或篡改这些痕迹,从而使得溯源技术失效。本文观察到,用于模型溯源的特定痕迹同时存在于高频分量和低频分量中,并在溯源过程中起着不同的作用。基于此,本文首次提出一种无训练的逃避攻击方法——TraceEvader,并在最符合现实环境的无盒场景下进行了测试。具体来说,TraceEvader将从原始DeepFakes中学习到的通用模仿痕迹注入到高频分量中,并在低频分量中引入对抗性模糊,以混淆某些痕迹的提取过程,从而逃避模型溯源。本文对4种最先进的模型溯源技术进行了实验,评估其在8种生成模型(包括生成对抗网络(Generative Adversarial Networks,GAN)和扩散模型(Diffusion Models,DM))生成的伪造图像上的表现。结果表明,TraceEvader实现了79%的最高平均攻击成功率,并且在面对图像转换和专业去噪技术时依然表现出了良好的鲁棒性,平均攻击成功率保持在75%左右。TraceEvader证实了当前模型溯源技术的局限性,并提醒DeepFakes研究人员和从业者探索更强大的模型溯源技术。

关键词: 深度伪造深度伪造溯源对抗攻击伪造人脸

DOI：10.19363/J.cnki.cn10-1380/tn.2026.03.05

投稿时间：2024-08-22修订日期：2024-10-29

基金项目:本课题得到国家重点研发计划青年科学家项目(No.2021YFB3100700)、国家自然科学基金项目(No.62202340、No.62372334)、河南省网络空间态势感知重点实验室开放课题基金重点项目(No.HNTS2022004)、武汉市知识创新计划项目(No.2022010801020127)、中央高校基本科研业务费专项(No.2042023kf0121)、CCF-绿盟科技“鲲鹏”科研基金(No.CCF-NSFOCUS 2023005)资助。

Evading Attacks for DeepFake Fake Model Traceability

WU Mengjie¹, YU Jiayi¹, WANG Run¹, YE Xi¹, ZHANG Yuyang¹, LIN Chenhao², FANG Liming³, WANG Lina¹

(1.Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China;2.Xi'an Jiaotong University, Xi'an 710049, China;3.Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China)

Abstract:

In recent years, the proliferation of DeepFakes has caused great alarm among the public and prominent figures. These highly realistic fake images and videos can spread disinformation on a large scale, cause reputational harm, and may even trigger social unrest. In order to deal with the generated fake images and videos, the research in the field of DeepFake forensics has been widely concerned. In the current DeepFake forensics research, DeepFake detection technology is responsible for judging whether a given sample is true or not, while DeepFake traceability technology aims to trace the type of counterfeit model that generates such Deepakes, so as to provide more explanatory results for DeepFake detection. Specifically, DeepFake traceability can be divided into model-schema traceability and model-instance traceability, where model-schema traceability only inferences the specific model schema used, while model-instance traceability attempts to identify model instances with specific training Settings. Both model-architecture and model-instance traceability methods rely on identifying specific traces left by the generation of deepfakes that savvy attackers can destroy or tamper with, rendering the traceability techniques ineffective. It is observed that specific traces used for model traceability exist in both high-frequency and low-frequency components and play different roles in the traceability process. Based on this, this paper proposes an untrained attack evading method—TraceEvader for the first time, and tests it in the most practical non-box setting. Specifically, TraceEvader injects generic imitation traces learned from the original DeepFakes into the high-frequency component and introduces adversarial ambiguity into the low-frequency component to obfuscate the extraction process of certain traces, thereby evading model traceability. In this paper, we experiment with four state-of-the-art model traceability techniques and evaluate their performance in eight generative models, including Generative Adversarial Networks (GANs) and Diffusion Models (DMs) generate representations on forged images. The results show that TraceEvader achieves the highest average attack success rate of 79%, and still shows good robustness in the face of image conversion and professional denoising techniques, and the average attack success rate remains around 75%. TraceEvader confirms the limitations of current model traceability techniques and reminds DeepFakes researchers and practitioners to explore more powerful model traceability techniques.

Key words: DeepFake DeepFake attribution adversarial attack forged face