基于生成式同义词挖掘的黑盒文本水印方法

王梓行; 陈兵; 李正潇

引用本文：

王梓行,陈兵,李正潇.基于生成式同义词挖掘的黑盒文本水印方法[J].信息安全学报,已采用 [点击复制]
wangzihang,chenbing,lizhengxiao.Black-Box Text Watermarking via Adversarial Synonym Generation[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 97次下载 0次
基于生成式同义词挖掘的黑盒文本水印方法
王梓行¹, 陈兵¹, 李正潇²
0 字体:加大+\|默认\|缩小-
(1.公安部第三研究所;2.武汉数字工程研究所)

摘要:

随着大语言模型（Large Language Models, LLMs）生成文本的逼真度和自然度不断提高，对其潜在滥用风险的管控需求日益迫切。文本水印作为一种主动防御技术，通过在生成内容中嵌入隐蔽标识，实现对LLMs生成文本的有效鉴别。然而，现有主流文本水印技术多依赖于LLMs的内部概率分布，在仅能获取最终输出的黑盒场景下难以适用，严重限制了其应用范围。针对这一问题，本文提出了一种基于生成式同义词挖掘的黑盒文本水印方法（GSYN-Watermark）。该方法无需访问目标模型内部参数，通过利用轻量级LLM为特定单词生成上下文相关的高质量候选同义词集。在此基础上，将传统逐词替换的线性嵌入方式替换为依托LLM的整体生成式水印嵌入方式，利用LLM的语义理解能力在候选空间中选择同时满足水印比特约束与最小语义偏移的词序列组合，生成最终水印文本。在多个基准数据集上的实验结果表明，与现有词汇替换类水印方法相比，该方法在语义隐蔽性、文本自然度和长文本运行效率方面均取得了显著提升，为黑盒场景下的生成文本检测提供了有效方案。

关键词: 大语言模型文本水印数字水印内容安全 AI安全

DOI：

投稿时间：2026-02-28修订日期：2026-05-12

基金项目:国家重点基础研究发展计划（973计划）2025年度科研优青班项目

Black-Box Text Watermarking via Adversarial Synonym Generation

wangzihang¹, chenbing¹, lizhengxiao²

(1.The Third Research Institute of the Ministry of Public Security;2.Wuhan Digital Engineering Institute)

Abstract:

The increasing realism and naturalness of text generated by Large Language Models (LLMs) have intensified the urgency of mitigating their potential misuse risks. Text watermarking, as a proactive defense technique, enables effective identification of LLM-generated content by embedding covert identifiers into synthetic text. However, prevailing watermarking methods predominantly rely on the internal probability distributions of LLMs, rendering them inapplicable in black-box scenarios where only final outputs are accessible—severely constraining their practical deployment. To address this limitation, we propose GSYN-Watermark, a black-box text watermarking method based on generative synonym mining. Without requiring access to the target model's internal parameters, our approach leverages a lightweight LLM to generate high-quality, context-aware candidate synonym sets for specific tokens. Building upon this, we replace conventional word-by-word substitution with a holistic generative watermarking paradigm that exploits the semantic comprehension capabilities of LLMs to select word sequence combinations satisfying both watermark bit constraints and minimal semantic divergence from the candidate space, thereby producing the final watermarked text. Extensive experiments across multiple benchmark datasets demonstrate that, compared to existing lexicon-substitution-based watermarking methods, our approach achieves substantial improvements in semantic stealthiness, textual naturalness, and runtime efficiency for long texts, offering an effective solution for generated text detection in black-box settings.

Key words: large language models text watermarking digital watermarking content security artificial intelligence security