Ada-SIR：一种参数自适应的跨语言鲁棒水印方法

王璐瑶; 刘长军; 符皓程; 易小伟; 况晓辉; 吴振东

引用本文：

王璐瑶,刘长军,符皓程,易小伟,况晓辉,吴振东.Ada-SIR：一种参数自适应的跨语言鲁棒水印方法[J].信息安全学报,已采用 [点击复制]
wangluyao,liuchangjun,fuhaocheng,yixiaowei,kuangxiaohui,wuzhendong.Ada-SIR: A Parameter-Adaptive Cross-Language Robust Watermarking Method[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 100次下载 0次
Ada-SIR：一种参数自适应的跨语言鲁棒水印方法
王璐瑶¹, 刘长军¹, 符皓程¹, 易小伟¹, 况晓辉², 吴振东²
0 字体:加大+\|默认\|缩小-
(1.中国科学院信息工程研究所;2.中国军事科学院系统工程研究所)

摘要:

近年来，大语言模型的规模化部署与应用极大推动了文本生成技术的革新，但其引发的内容恶意滥用风险，使得对文本内容溯源与版权保护的需求日益迫切。文本水印作为实现内容身份标识与溯源跟踪的核心技术，已广泛应用于学术论文、新闻资讯及企业核心数据等各类文档场景中。尽管如此，在全球化和数字内容跨语言传播的背景下，攻击者常通过机器翻译对含水印文本进行语义保持的跨语言转换，在保留核心语义的同时，重构水印嵌入所依赖的文本表层特征，以规避溯源追踪。而现有水印方法多依赖单语言表层特征完成嵌入，在跨语言场景下往往因特征丢失而难以实现准确的水印提取，面临着鲁棒性严重不足的挑战。针对这一问题，本文提出了一种参数自适应的跨语言鲁棒水印方法(Ada-SIR)。该方法创新性地引入两个生成模块，利用多层感知机构建从高维语义特征空间到水印参数决策空间的非线性映射，通过实时感知上下文语义的细粒度变化，端到端地计算出适配于当前语境的最优水印参数，从而在保证不可感知性的同时显著增强水印的鲁棒性。此外，本文通过构建多语言语义簇来处理语义对齐任务，确保语义等价的Token在跨语言转换后仍能保持一致的水印特征。实验结果表明，该方法在保证大语言模型生成文本语义一致性的同时，将跨语言场景下的水印性能平均保持率提升了6.82%，有效缓解了跨语言场景下水印信号衰减的难题，为多语言环境及复杂现实应用中的文本版权保护提供了更加可靠的解决方案。

关键词: 大语言模型文本水印参数自适应跨语言语义簇鲁棒水印

DOI：

投稿时间：2026-02-06修订日期：2026-05-12

基金项目:国家自然科学基金（No.62272456）,国家重点研发计划（No.2022QY0101）

Ada-SIR: A Parameter-Adaptive Cross-Language Robust Watermarking Method

wangluyao¹, liuchangjun¹, fuhaocheng¹, yixiaowei¹, kuangxiaohui², wuzhendong²

(1.Institute of Information Engineering，Chinese Academy of Sciences;2.Institute of Systems of Engineering, Academy of Military Science)

Abstract:

In recent years, the large-scale deployment and application of Large Language Models (LLMs) have significantly revolutionized text generation technologies. However, the accompanying risks of malicious content misuse have rendered the requirements for text provenance and copyright protection increasingly imperative. Text watermarking, as a core technology for identity attribution and provenance tracking, has been widely applied in various document scenarios such as academic papers, news reports, and proprietary corporate data. Nevertheless, in the context of globalized and cross-lingual dissemination of digital content, adversaries frequently utilize machine translation to perform semantic-preserving cross-lingual transformations on watermarked text. This process reconstructs the surface-level features relied upon by watermark embedding while retaining core semantics, thereby evading provenance tracking. Existing watermarking methods predominantly rely on monolingual surface features for embedding, which often leads to inaccurate watermark extraction due to feature loss in cross-lingual scenarios, posing a severe challenge to robustness. To address this issue, this paper proposes a parameter-adaptive cross-lingual robust watermarking method, termed Ada-SIR. This method innovatively introduces two generative modules and utilizes a Multi-Layer Perceptron (MLP) to construct a nonlinear mapping from a high-dimensional semantic feature space to a watermark parameter decision space. By capturing fine-grained changes in contextual semantics in real-time, the method computes optimal watermark parameters adapted to the current context in an end-to-end manner, significantly enhancing watermark robustness while ensuring imperceptibility. Furthermore, this paper constructs multilingual semantic clusters to handle semantic alignment tasks, ensuring that semantically equivalent tokens maintain consistent watermark features after cross-lingual transformation. Experimental results demonstrate that while ensuring the semantic consistency of text generated by LLMs, this method improves the average watermark performance retention rate in cross-lingual scenarios by 6.82%. This effectively alleviates the problem of watermark signal attenuation in cross-lingual settings and provides a more reliable solution for text copyright protection in multilingual environments and complex real-world applications.

Key words: large language model text watermark parameter adaptation cross-lingual semantic clusters robust Water marking