基于帧间差异的人脸篡改视频检测方法

张怡暄; 李根; 曹纭; 赵险峰

引用本文：

张怡暄,李根,曹纭,赵险峰.基于帧间差异的人脸篡改视频检测方法[J].信息安全学报,2020,5(2):49-72 [点击复制]
ZHANG Yixuan,LI Gen,CAO Yun,ZHAO Xianfeng.A Method for Detecting Human-face-tampered Videos based on Interframe Difference[J].Journal of Cyber Security,2020,5(2):49-72 [点击复制]

本文已被：浏览 10841次下载 10426次	码上扫一扫！
基于帧间差异的人脸篡改视频检测方法
张怡暄, 李根, 曹纭, 赵险峰
0 字体:加大+\|默认\|缩小-
(中国科学院信息工程研究所信息安全国家重点实验室北京中国 1000932. 中国科学院大学网络空间安全学院北京中国 100093)

摘要:

近几年，随着计算机硬件设备的不断更新换代和深度学习技术的不断发展，新出现的多媒体篡改工具可以让人们更容易地对视频中的人脸进行篡改。使用这些新工具制作出的人脸篡改视频几乎无法被肉眼所察觉，因此我们急需有效的手段来对这些人脸篡改视频进行检测。目前流行的视频人脸篡改技术主要包括以自编码器为基础的Deepfake技术和以计算机图形学为基础的Face2face技术。我们注意到人脸篡改视频里人脸区域的帧间差异要明显大于未被篡改的视频中人脸区域的帧间差异，因此视频相邻帧中人脸图像的差异可以作为篡改检测的重要线索。在本文中，我们提出一种新的基于帧间差异的人脸篡改视频检测框架。我们首先使用一种基于传统手工设计特征的检测方法，即基于局部二值模式（Local binary pattern，LBP）/方向梯度直方图（Histogram of oriented gradient，HOG）特征的检测方法来验证该框架的有效性。然后，我们结合一种基于深度学习的检测方法，即基于孪生网络的检测方法进一步增强人脸图像特征表示来提升检测效果。在FaceForensics++数据集上，基于LBP/HOG特征的检测方法有较高的检测准确率，而基于孪生网络的方法可以达到更高的检测准确率，且该方法有较强的鲁棒性；在这里，鲁棒性指一种检测方法可以在三种不同情况下达到较高的检测准确率，这三种情况分别是：对视频相邻帧中人脸图像差异用两种不同方式进行表示、提取三种不同间隔的帧对来计算帧间差异以及训练集与测试集压缩率不同。

关键词: 视频篡改篡改检测帧间差异孪生网络 Deepfake Face2face

DOI：10.19363/J.cnki.cn10-1380/tn.2020.02.05

投稿时间：2019-12-20修订日期：2020-03-09

基金项目:本课题得到国家重点研发计划课题（No.19QY2202，No.19QY（Y）0207）；中国科学院信息工程研究所攀登计划项目

A Method for Detecting Human-face-tampered Videos based on Interframe Difference

ZHANG Yixuan, LI Gen, CAO Yun, ZHAO Xianfeng

(State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China 2. School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100093, China)

Abstract:

With the continuous upgrade of computer hardware and the continuous development of deep learning techniques in recent years, new multimedia tampering tools make it easier for people to tamper human faces in videos. Human-face-tampered videos created with these new tools can hardly be noticed by naked eyes, thus we urgently need effective methods to detect these human-face-tampered videos. At present, popular techniques used to tamper human faces in videos mainly include the autoencoder-based Deepfake and the computer-graphics-based Face2face. We have noticed that interframe differences between human face regions in human-face-tampered videos are significantly greater than those of untampered videos, so the differences between human face images in adjacent frames of videos can be utilized as an important clue for tampering detection. In this paper, we propose a new detection framework for human-face-tampered videos based on interframe differences. We first use a detection method based on artificially designed features which is traditional, namely Local Binary Pattern(LBP)/Histogram of Oriented Gradient(HOG)-feature-based detection method to verify the effectiveness of the proposed detection framework. Then, with a deep-learning-based detection method, namely Siamese-network-based detection method, we further strengthen feature representation of human face images to improve detection performance. In FaceForensics++ dataset, the LBP/HOG-feature-based detection method can have relatively high detection accuracy; while the Siamese-network-based detection method can reach higher detection accuracy, and the method has relatively strong robustness; here, the robustness refers to that a detection method can reach relatively high detection accuracy in three different situations, they are expressing differences of human face images in adjacent frames of videos in two different ways, extracting pairs of frames in three different intervals for calculating interframe differences, the training dataset and the testing dataset have different compression rates.

Key words: video tampering tampering detection interframe difference siamese network Deepfake Face2face