基于大模型的PyPI动态供应链攻击检测技术

刘亮

引用本文：

刘亮.基于大模型的PyPI动态供应链攻击检测技术[J].信息安全学报,已采用 [点击复制]
LIU LIANG.Large Model-Assisted Dynamic Detection of Supply Chain Attacks[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 56次下载 0次
基于大模型的PyPI动态供应链攻击检测技术
刘亮
0 字体:加大+\|默认\|缩小-
(四川大学网络空间安全学院)

摘要:

Python的广泛应用及其对第三方开源库的高度依赖特性，使得Python软件包仓库PyPI（Python Package Index）成为现代软件供应链中的关键节点，同时也是首要攻击目标。攻击者通常利用Python语言本身的动态执行特性、社会工程学和代码混淆等技术来绕过传统检测手段，以实施针对PyPI仓库的供应链攻击，严重威胁关键基础设施信息系统安全。然而，现有防御方案通常采用离线大范围扫描技术，普遍存在响应滞后，性能开销巨大等问题。本文提出一种基于运行时劫持的实时检测框架，通过在Python解释器层面嵌入安全检测逻辑，能够对供应链攻击的早期阶段进行实时拦截。同时，该框架引入大语言模型辅助识别技术，利用大语言模型进行预筛选，并在此基础上动态劫持威胁源调用，从而分析执行上下文，最终检测精度优于当前主流的集中式离线检测方法。另外，通过采取提前剪枝和大语言模型介入的预处理优化技术，本文框架能够降低运行时开销。在实验分析部分，本文基于2840个PyPI软件包进行验证，最终结果显示，本文提出的框架能够达到91.9%的准确率以及95.8%的召回率，同时仅引入24%运行时开销，使其具备投入生产环境及 CI/CD 流水线集成的可行性。本文研究表明：在解释器层级构建、并辅以大语言模型增强的运行时防御机制，可为保障动态语言生态体系抵御不断演化的供应链威胁，提供一种切实可行且高保真的解决方案。

关键词: 供应链攻击、运行时检测、动态劫持、PyPI安全

DOI：

投稿时间：2025-07-14修订日期：2026-01-04

基金项目:

Large Model-Assisted Dynamic Detection of Supply Chain Attacks

LIU LIANG

(School of Cyber Science and Engineering, SICHUANG University)

Abstract:

Python’s extensive usage across diverse domains—from data science and web development to AI systems—combined with its deep dependence on third-party open-source libraries, has made the Python Package Index (PyPI) a cornerstone of the modern software supply chain. However, this centrality also positions PyPI as a prime target for supply chain at-tacks. Adversaries typically leverage Python’s dynamic nature, deceptive naming, social engineering, and sophisticated code obfuscation to bypass static analysis and evade detection—posing severe risks to national and enterprise critical infrastructure. Current mitigation strategies mostly rely on offline, large-scale repository scanning—approaches that are intrinsically reactive, suffer from high latency, and incur substantial computational costs, limiting their applicability in real-world deployment. To bridge this gap, this paper introduces RT-Sentry, a novel runtime interception-based detection framework designed for proactive, real-time supply chain threat identification. Our core insight is to instrument the Py-thon interpreter itself: by hooking into key execution pathways, RT-Sentry injects lightweight security checks that mon-itor behavior as it happens. Crucially, rather than applying heavyweight analysis indiscriminately, the framework incor-porates a two-stage pipeline. First, a lightweight Large Language Model (LLM)-assisted pre-filter rapidly screens in-coming or imported packages by correlating package metadata, code patterns, and historical threat intelligence to gener-ate a shortlist of high-risk candidates. Only these candidates trigger deep context-aware runtime interception—where dynamic hooks capture execution traces, environment variables, network destinations, and shell command invocations for fine-grained analysis. Experimental validation was conducted on a curated dataset of 2,840 PyPI packages, including 1,974 confirmed malicious samples. RT-Sentry achieved 91.9% accuracy and 95.8% recall, significantly outperforming signature-based and purely static approaches, while incurring only 24% average runtime overhead—a favorable trade-off enabling integration into CI/CD pipelines and production environments. In summary, this work demonstrates that a hy-brid defense architecture—combining interpreter-level instrumentation with LLM-guided threat prioritization—offers a scalable, low-latency, and high-precision solution for securing dynamic-language ecosystems against evolving supply chain threats. It underscores the viability of runtime as the new frontier for supply chain security.

Key words: supply chain attacks, runtime detection, dynamic hijacking, PyPI security