引用本文: |
-
黄逸焕,彭荔,任延珍,王丽娜.基于说话行为相关面部关键点的 鲁棒伪造人脸检测方案[J].信息安全学报,已采用 [点击复制]
- huangyihuan,pengli,renyanzhen,wanglina.A Robust Forged Face Detection Scheme Based on Speech-Related Facial Landmarks[J].Journal of Cyber Security,Accept [点击复制]
|
|
摘要: |
生成式音视频的日趋逼真给伪造检测带来了巨大挑战,从社交媒体平台的虚假视频发布,到政治宣传中的误导性内容,其潜在风险无处不在。因此,针对伪造说话人脸视频的有效检测与防范机制变得尤为迫切和重要。然而,现有的主流深度伪造检测方法存在无法区分压缩痕迹和伪造痕迹的问题,导致其在检测高度压缩的视频和社交通信场景时准确率显著下降。为了解决这一问题,本文提出了一种基于说话行为相关面部关键点的鲁棒伪造说话人脸检测方案FALNet(Facial-Landmark based Graph Attention Network)。本文通过分析说话过程中的肌肉动态,设计了一个基于面部肌肉运动的邻接矩阵。该矩阵不仅保留了面部的拓扑信息,而且可以有效捕捉真假面部特征之间的差异。另外,考虑到时间特征在视频伪造检测中的重要性,本文同时对长短时特征进行建模。具体来说,FALNet首先使用图注意力网络捕获短时特征,随后将短时特征序列输入循环神经网络以进行长时特征建模。实验结果表明,FALNet在多个公开数据集的视频伪造子集上均取得了超过98%的检测准确率,相比于现有基于面部关键点的先进方法,FALNet的AUC值(Area Under the Curve)取得了0.6%-1.1%的提升,且在面对压缩时检测的AUC值保持在94%以上,具有良好的鲁棒性。 |
关键词: 伪造说话人脸检测 深度学习 鲁棒性 面部关键点 |
DOI: |
投稿时间:2024-07-19修订日期:2024-12-11 |
基金项目:国家自然科学基金项目(No. 62172306,62372334) |
|
A Robust Forged Face Detection Scheme Based on Speech-Related Facial Landmarks |
huangyihuan, pengli, renyanzhen, wanglina
|
(WuHan University) |
Abstract: |
The generation of synthetic audio-visual content is becoming increasingly realistic, posing significant challenges to the detection of falsified video. From the dissemination of fake audiovisuals on social media platforms to misleading content in political propaganda, the potential risks are pervasive. Consequently, the need for effective detection and prevention mechanisms against forged speaker facial videos has become urgent and crucial. However, current mainstream deepfake detection methods struggle to differentiate between compression artifacts and forgery artifacts, leading to a significant drop in detection accuracy in scenarios involving highly compressed videos and social media communications. We propose a Fa-cial-Landmark based Graph Attention Network (FALNet) for detecting forged speaker facial videos, which decouples facial landmarks from video. We introduce a robust video feature extraction network based on facial landmarks and analyze the muscle movements associated with speech behavior, as well as the forged cues introduced during the generation of deepfake speaker facial videos. We designed an adjacency matrix based on facial muscle movements by analyzing the muscle dynamics during speech. The matrix not only preserves the topological information of the face but also effectively captures the dif-ferences between genuine and fake facial features. Using a graph attention network as the backbone, we extracted facial features represented by this adjacency matrix. Furthermore, considering the importance of temporal features in video forgery detection, we modeled both short-term and long-term features. Specifically, we first used a graph attention network to capture short-term features and then fed the sequence of short-term features into a recurrent neural network to model long-term dependencies. Experimental results show that our scheme has achieved a detection accuracy of over 98% on video forgery subsets of multiple public datasets. Compared to existing advanced methods based on facial key points, this scheme has achieved a 0.6% to 1.1% improvement in AUC (Area Under the Curve) value. Furthermore, when facing compression, the detection AUC value of this scheme remains above 94%, demonstrating good robustness. |
Key words: Deepfake Detection Deep Learning Robustness Facial Landmarks |