InterceptRAG:面向检索增强生成的知识篡改攻击

孙浩程; 袁得嵛; 邓钰洋; 廖望; 严智涵; 孔瑜倩

引用本文：

孙浩程,袁得嵛,邓钰洋,廖望,严智涵,孔瑜倩.InterceptRAG:面向检索增强生成的知识篡改攻击[J].信息安全学报,已采用 [点击复制]
SunHaocheng,YuanDeyu,DengYuyang,Liaowang,YanZhihan,KongYuqian.InterceptRAG: Knowledge Tampering Attack Against Retrieval-Augmented Generation[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 9次下载 0次
InterceptRAG:面向检索增强生成的知识篡改攻击

0 字体:加大+\|默认\|缩小-
(中国人民公安大学)

摘要:

检索增强生成（RAG）通过整合外部知识库显著提升了大语言模型的事实准确性，但其分布式架构也带来了新的安全风险。当前针对 RAG 系统的攻击研究多聚焦于知识库投毒，这类方法通常需要攻击者具备修改知识库的权限，且易受审计检测和检索召回不确定性的限制，存在较强局限性。为此，本文关注检索与生成之间数据流的可拦截性，提出 InterceptRAG，一种面向 RAG 系统的检索后知识篡改攻击框架。该框架无需污染底层知识库，仅在检索完成后、生成开始前拦截并篡改检索结果，并根据用户查询类型自适应地实施针对性攻击：对于封闭式问题，注入虚假事实以诱导模型输出错误答案；对于开放式问题，注入偏见内容以操纵模型输出立场。为确保篡改内容的有效性与隐蔽性，本文引入多文档一致性约束与语义保持机制，使篡改后的文档在核心结论上保持一致，同时与原始文档保持较高语义相似度，从而降低被语义异常检测识别的风险。实验结果表明，InterceptRAG 在多个基准数据集和多款主流模型上，对封闭式问题的攻击成功率达到 88% 至 100%，对开放式问题的偏见注入成功率达到 82% 以上。相比现有知识投毒方法，InterceptRAG 取得了更优的攻击效果，并在困惑度过滤、语义一致性检测、知识扩展等主流防御机制下表现出较强鲁棒性。本研究揭示了 RAG 系统检索与生成之间数据传输层面的潜在安全威胁，可为构建更安全可信的 RAG 架构提供参考。

关键词: 大语言模型检索增强生成知识篡改偏见注入

DOI：

投稿时间：2026-03-16修订日期：2026-05-27

基金项目:公安部技术研究计划项目

InterceptRAG: Knowledge Tampering Attack Against Retrieval-Augmented Generation

SunHaocheng^1,2,3, YuanDeyu^1,2,3, DengYuyang^1,2,3, Liaowang^1,2,3, YanZhihan^1,2,3, KongYuqian^1,2,3

(1.People'2.'3.s Public Security University of China)

Abstract:

Retrieval-Augmented Generation (RAG) significantly improves the factual accuracy of large language models by integrating external knowledge bases, yet its distributed architecture also introduces new security risks. Existing studies on attacks against RAG systems mainly focus on knowledge base poisoning. Such methods typically require attackers to have permission to modify the knowledge base, and are susceptible to security auditing and the uncertainty of retrieval recall, thereby exhibiting considerable limitations. To address this issue, this paper focuses on the interceptability of the data flow between retrieval and generation, and proposes InterceptRAG, a post-retrieval knowledge tampering attack framework for RAG systems. Without contaminating the underlying knowledge base, this framework intercepts and tampers with retrieval results only after retrieval is completed and before generation begins, and adaptively launches targeted attacks according to user query types: for closed-ended questions, it injects false facts to induce the model to output incorrect answers; for open-ended questions, it injects biased content to manipulate the stance of the model''s responses. To ensure the effectiveness and stealthiness of the tampered content, this paper introduces a multi-document consistency constraint and a semantic preservation mechanism, enabling the tampered documents to remain consistent in their core conclusions while maintaining high semantic similarity to the original documents, thereby reducing the risk of being identified by semantic anomaly detection methods. Experimental results show that, across multiple benchmark datasets and mainstream models, InterceptRAG achieves an attack success rate of 88% to 100% on closed-ended questions and a bias injection success rate of over 82% on open-ended questions. Compared with existing knowledge poisoning methods, InterceptRAG achieves superior attack performance and exhibits strong robustness under mainstream defense mechanisms, including perplexity filtering, semantic consistency detection, and knowledge expansion. This study reveals potential security threats at the data transmission layer between retrieval and generation in RAG systems, and provides a reference for building safer and more trustworthy RAG architectures.

Key words: large language models retrieval-augmented generation knowledge tampering bias injection