开源漏洞补丁管理现状与识别技术研究综述

孙晴; 肖扬; 许丽丽; 孙天琦; 霍玮

引用本文：

孙晴,肖扬,许丽丽,孙天琦,霍玮.开源漏洞补丁管理现状与识别技术研究综述[J].信息安全学报,已采用 [点击复制]
SUN Qing,XIAO Yang,XU Lili,SUN Tianqi,HUO Wei.A Survey on Vulnerability Patch Management and Patch Identification Techniques in Open-Source Software[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 826次下载 0次
开源漏洞补丁管理现状与识别技术研究综述
孙晴¹, 肖扬¹, 许丽丽², 孙天琦¹, 霍玮¹
0 字体:加大+\|默认\|缩小-
(1.中国科学院信息工程研究所;2.个人)

摘要:

随着开源软件生态系统的日益复杂化，软件漏洞加剧了软件供应链的安全挑战。作为漏洞修复的关键信息载体，补丁不仅直接服务于漏洞修复，还是漏洞检测与补丁移植等多个安全任务的重要基础。本文系统地调研和分析了主流漏洞库对补丁的收录与管理情况，并归纳和评测了当前漏洞补丁自动化识别技术的发展脉络与识别能力。首先，本研究首次构建了关注补丁完整性的标准化评估数据集OCVP-DB，该数据集覆盖四种不同的编程语言且超过70%的漏洞补丁由多个代码提交构成。其次，本文对主流漏洞数据库、开源漏洞补丁数据集和漏洞补丁识别技术进行了分析。最后，本文对上述三类对象进行了定量评估：主流漏洞数据库的补丁收录与管理情况、现有开源软件漏洞补丁数据集的准确性，以及已有代表性漏洞补丁识别技术的效能。实验结果表明，漏洞数据库中近80%的漏洞条目未提供漏洞补丁，现有补丁数据集在覆盖范围和更新及时性方面存在明显局限，如数据集VulasDB的补丁代码提交完整性低于40%。在补丁识别技术方面，基于信息关联匹配的方法在规范化程度高的项目中表现良好，其中多源信息方法达到了68.85%的准确率和55.99%的召回率，而基于模型排序的方法即使在最优情况下召回率也仅为23.10%。本研究不仅系统总结了当前技术现状，还通过全面评估为改进补丁识别技术和构建更全面的补丁数据集提供了重要参考。最后，在总结当前研究不足的基础上，对未来补丁识别技术的发展趋势和潜在研究方向进行了展望。

关键词: 漏洞补丁补丁识别技术漏洞数据库补丁数据集

DOI：

投稿时间：2025-01-17修订日期：2025-02-27

基金项目:国家重点研发计划，国家自然科学基金

A Survey on Vulnerability Patch Management and Patch Identification Techniques in Open-Source Software

SUN Qing¹, XIAO Yang¹, XU Lili², SUN Tianqi¹, HUO Wei¹

(1.Institute of Information Engineering;2.个人)

Abstract:

With the increasing complexity of open-source software ecosystems, software vulnerabilities have intensified the security challenges in software supply chains. As crucial information carriers for vulnerability remediation, patches not only directly serve vulnerability fixes but also form the fundamental basis for multiple security tasks, including vulnerability detection and patch transplantation. This paper systematically investigates and analyzes patch collection and management in vulnerability databases while summarizing and evaluating the development trajectory and identification capabilities of current automated vulnerability patch identification techniques. First, we introduce OCVP-DB, a novel standardized dataset focusing on patch completeness. This dataset encompasses four programming languages, with over 70% of vulnerability patches comprising multiple commits. Subsequently, we conduct a comprehensive analysis of mainstream vulnerability databases, open-source patch datasets, and patch identification techniques. Furthermore, we perform quantitative evaluations of these three aspects: patch collection and management status in mainstream vulnerability databases, the accuracy of existing open-source patch datasets, and the effectiveness of representative patch identification techniques. Experimental results reveal that nearly 80% of vulnerability entries in databases lack patch information, and existing patch datasets show significant limitations in coverage and timeliness, exemplified by VulasDB's patch commit completeness being below 40%. Regarding patch identification techniques, information correlation-based matching methods demonstrate superior performance in highly standardized projects, with multi-source information approaches achieving 68.85% precision and 55.99% recall. In contrast, model-ranking-based methods achieve a recall rate of only 23.10% even under optimal conditions. This research not only systematically summarizes the current technological landscape but also provides crucial insights for improving patch identification techniques and constructing more comprehensive patch datasets through extensive evaluation. Finally, based on the identified research gaps, we present perspectives on future trends and potential research directions in patch identification technology.

Key words: Security Patches Patch Identification Techniques Vulnerability Databases Patch Dataset