【打印本页】      【下载PDF全文】   查看/发表评论  下载PDF阅读器  关闭
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 4008次   下载 2950 本文二维码信息
分享到: 微信 更多
(中国科学院信息工程研究所 信息安全国家重点实验室 北京 中国 100093;中国科学院大学 网络空间安全学院 北京 中国 100093)
关键词:  视频换脸  神经网络检测  卷积长短时记忆网络  特征网络金字塔
Multi-scale Time-Spatial Domain Detection of Fabricated Face Video Based on 3D Convolution
BAO Han,FU Haocheng,CAO Yun,ZHAO Xianfeng,TANG Peng
State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100093, China
Recently, Deepfake technology has developed rapidly, which can be used to forge videos. The abuse of such fake videos has caused serious harm to society and has now attracted widespread attention from governments and public opinion. Based on a thorough investigation, this paper figures out that the current mainstream generation methods have forgery traces and generation losses in both the temporal and spatial domains. However, most of the current algorithms for detecting fabricated face videos based on neural networks only consider the features of a single image in the spatial domain, and have overfitting problems, resulting in low accuracy in actual detection. In order to solve the mentioned shortcomings, this paper evaluates the state-of-the-art detection algorithms of the Deepfake face and proposes an effective detection algorithm based on the combination of spatial and temporal features. Our network considers both spatial and temporal features of the fabricated face video. As for the single frame in the video, we present a fully convolutional network to extract the spatial feature. This module adopts a 3D convolution structure, which can accurately extract the forgery traces of each frame in the video frame array. As for frame array, we build a module based on a convolutional network with Long Short-Term Memory (LSTM) for temporal feature extraction. This module is able to detect timing forgery traces between fake video frames. At last, we apply Feature Pyramid Networks (FPN) to improve the accuracy of face classification. This structure can fuse Time-Spatial features of different sizes. It can improve the classification effect through multi-scale fusion and reduce overfitting. Comparative experiments have demonstrated that the proposed method is more effective in terms of the performance of training convergence and classification accuracy. In addition, we adopt fewer parameters and achieve high detection accuracy, resulting in higher training efficiency compared with the existing methods.
Key words:  deepfake videos  neural network detection  convolutional long and short-term memory  feature pyramid networks