| 摘要: |
| 随着社交媒体的广泛普及,自动化媒体内容编辑与生成技术已成为内容创作的重要组成部分。这些技术在显著降低创作门槛、促进信息传播的同时,也加剧了恶意篡改内容的产生与扩散。被篡改的媒体内容可能引发虚假信息传播,进而对社会信任体系与伦理道德构成严重威胁。媒体篡改检测旨在判断媒体内容是否被篡改,并精确定位篡改区域,可用于识别虚假内容、保障数据真实性与可信度,在维护网络信息安全方面具有重要意义。随着篡改手段逐渐由Photoshop篡改转变为生成式篡改,合成内容的真实性与自然度不断提升,对传统检测方法提出了前所未有的挑战,促进了篡改媒体检测技术的不断演进。篡改媒体检测方法的演进规律为:由粗粒度多模态对齐融合到多维度对齐和细粒度篡改线索挖掘、由简单二分类到“分类-定位-解释”三合一功能、由底层伪造痕迹分析到高层逻辑规律和语义信息等多角度篡改分析。近年来,研究者根据篡改媒体检测技术难以适应现实场景的难点,提出了鲁棒性检测和跨域篡改检测技术,提升了鲁棒性和泛化性。本文系统介绍了媒体篡改检测的概念,并探讨当前研究面临的主要挑战与未来发展趋势,是首个系统梳理篡改媒体检测多模态技术、多模态大模型驱动的可解释性检测、近5年跨域/鲁棒性延申技术的综述 |
| 关键词: 媒体篡改检测 生成式篡改 多角度篡改分析 鲁棒性检测 |
| DOI: |
| 投稿时间:2026-02-24修订日期:2026-04-01 |
| 基金项目:国家重点基础研究发展计划(973计划) |
|
| A Survey on Media Content Manipulation Detection Methods |
|
Zhang Peng, Zhang Kexin, Qin Xugong
|
| (Nanjing University of Science and Technology) |
| Abstract: |
| The transition from photoshop-based manipulation to generative forgery has markedly enhanced the realism and naturalness of synthesized content, posing unprecedented challenges to conventional detection methods and driving the evolution of multimodal manipulation detection technologies. The developmental trajectory of these methods can be delineated along three dimensions:a progression from coarse-grained multimodal alignment and fusion to multi-dimensional alignment coupled with fine-grained tampering clue mining; a transition from simple binary classification to a tripartite "classification-localization-interpretation" framework; and an expansion from low-level forgery trace analysis to multi-perspective tampering assessment incorporating high-level logical patterns and semantic information. In recent years, to address the challenges of adapting detection techniques to real-world scenarios, researchers have developed robust detection and cross-domain tampering detection methods, significantly improving both robustness and generalization capabilities. This paper provides a systematic introduction to media tampering detection, examining the primary challenges facing current research and exploring future development trends. It constitutes the first comprehensive review that systematically synthesizes multimodal technologies for media manipulation detection, explainable detection driven by multimodal large language models, and cross-domain/robustness extension techniques developed over the past five years. |
| Key words: Media Manipulation Detection generative forgery multi-perspective tampering analysis robust detection |