引用本文
  • 熊义毛,丁湘陵,谷庆,杨高波,赵险峰.深度视频修复篡改的被动取证研究[J].信息安全学报,已采用    [点击复制]
  • xiongyimao,dingxiangling,guqing,杨高波,zhaoxiangfeng.The Passive Forensics of Deep Video inpainting[J].Journal of Cyber Security,Accept   [点击复制]
【打印本页】 【下载PDF全文】 查看/发表评论下载PDF阅读器关闭

过刊浏览    高级检索

本文已被:浏览 609次   下载 0  
深度视频修复篡改的被动取证研究
0
(1.湖南科技大学;2.湖南大学;3.中国科学院)
摘要:
深度视频修复技术就是利用深度学习技术,对视频中的缺失区域进行补全或移除特定目标对象。它也可用于合成篡改视频,其篡改后的视频很难通过肉眼辨别真假,尤其是一些恶意修复的视频在社交媒体上传播时,容易造成负面的社会舆论。目前,针对深度视频修复篡改的被动检测技术起步较晚,尽管它已经得到一些关注,但在研究的深度和广度上还远远不够。因此,本文提出一种基于级联ConvGRU和八方向局部注意力的被动取证技术,从时空域角度实现对深度修复篡改区域的定位检测。首先,为了提取修复区域的更多特征,RGB帧和错误级分析帧ELA平行输入编码器中,通过通道特征级融合,生成不同尺度的多模态特征。其次,在解码器部分,使用编码器生成的多尺度特征与串联的ConvGRU进行通道级融合来捕捉视频帧间的时域不连续性。最后,在编码器的最后一级RGB特征后,引入八方向局部注意力模块,该模块通过八个方向来关注像素的邻域信息,捕捉修复区域像素间的异常。实验中,本文使用了VI、OP、DSTT和FGVC四种最新的深度视频修复方法与已有的深度视频修复篡改检测方法HPF和VIDNet进行了对比,性能优于HPF且在编码器参数仅VIDNet的五分之一的情况下获得与VIDNet可比的性能。结果表明,本文所提方法利用多尺度双模态特征和引入的八方向局部注意力模块来关注像素间的相关性,使用ConvGRU捕捉时域异常,实现像素级的篡改区域定位,获得精准的定位效果。
关键词:  深度视频修复  视频篡改检测  级联ConVGRU  局部注意力模块  空时预测
DOI:
投稿时间:2022-10-31修订日期:2023-04-12
基金项目:国家自然科学基金(62272160);信息安全国家重点实验室开放课题(2021-ZD-07);河南省网络空间态势感知重点实验室开放课题基金资助(HNTS2022025)
The Passive Forensics of Deep Video inpainting
xiongyimao1, dingxiangling1, guqing2, 杨高波3, zhaoxiangfeng4
(1.Hunan University of science and technology;2.Hunan University of Science and Technology;3.Hunan University;4.Chinese Academy of Sciences)
Abstract:
Deep video inpainting is to fill missing areas or remove the specific target objects in the video by using deep learn-ing technology. It is also exploited to synthesize tampered videos. The tampered videos are arduous to be identified with the naked eye. Especially, it is easy to cause negative public perspectives when some maliciously inpainted videos are spread on social media. At present, although it has received some attentions, its passive detection was far from enough in the depth and breadth of research for deep video inpainting. Therefore, this paper proposes a pas-sive forensics technique based on cascaded ConvGRU and eight-direction local attention to achieve the localization of inpainted regions in deep tampered videos. The proposed method aims to localize the tampered regions in deep inpainted videos from the spatiotemporal domain. Firstly, RGB frames and error-level analysis frames, ELA, are fed into the encoder in parallel to extract more features of the inpainted area, and then multi-modal features are gener-ated at different scales through channel feature-level fusion. Secondly, in the decoder, encoder-generated multi-modal features cascaded ConvGRUs are utilized to capture the temporal continuity between video frames. Finally, in the last level RGB feature of the encoder, an eight-direction local attention module is introduced, which pays at-tention to the neighborhood information of pixels through eight directions and captures the anomaly between pixels in the inpainted area. In the experiment, four latest deep video inpainting methods, VI, OP, DSTT, and FGVC, were used to compare their performance with existing deep video inpainting tamper detection methods, HPF and VIDNet. The performance was superior to HPF and comparable to VIDNet was achieved when the encoder parameters were only one-fifth of VIDNet. The results show that the proposed method focuses on the correlation between pixels by generating multi-modal features and introduces an eight-direction local attention module. Concurrently, the Con-vGRU takes advantage of capturing temporal anomalies, by achieving tampered positioning, and obtaining accurate localization effect.
Key words:  Deep video inpainting  Video forgery detection  Cascaded ConvGRU  Local attention module  Spatio-temporal prediction