  • 宋振宇,王嘉炜,宋晨,宋燕妮,赵元博,郭璇.基于符号约束比对的漏洞触发样本去重方法[J].信息安全学报,已采用    [点击复制]
  • songzhenyu,wangjiawei,songchen,songyanni,zhaoyuanbo,guoxuan.Crash Deduplication Based on Symbolic Constraint Comparison[J].Journal of Cyber Security,Accept   [点击复制]
【打印本页】 【下载PDF全文】 查看/发表评论下载PDF阅读器关闭

过刊浏览    高级检索

本文已被:浏览 44次   下载 0  
宋振宇, 王嘉炜, 宋晨, 宋燕妮, 赵元博, 郭璇
关键词:  漏洞触发样本去重  漏洞成因  符号约束  软件漏洞
Crash Deduplication Based on Symbolic Constraint Comparison
songzhenyu, wangjiawei, songchen, songyanni, zhaoyuanbo, guoxuan
(Institute of Information Engineering, Chinese Academy of Sciences)
Security personnel can discover a considerable number of crash samples in a short time by using fuzzing tools. However, the deduplication of these numerous crash samples faces serious challenges in practical security analysis. Existing methods based on crash location and call stack often encounter over-clustering problems, which reduces their effectiveness, while approaches based on vulnerability root causes typically suffer from high computational expenses, making them impractical for large-scale applications.In order to address these significant challenges, this paper presents a novel crash deduplication approach based on symbolic constraint comparison. This innovative method first identifies the key bytes of crash samples through byte-by-byte transformation and testing to accurately determine the root causes of vulnerabilities. Subsequently, it collects symbolic path constraints to extract the corresponding control-flow features of crash samples, providing a detailed representation of each crash's execution pattern. Finally, the method achieves efficient deduplication by calculating similarity matrices among the control-flow characteristics and conducting clustering using both spectral clustering and DBSCAN algorithms, thereby enabling flexible adaptation to different vulnerability distributions.The effectiveness and efficiency of the proposed approach were thoroughly evaluated on seven target program datasets carefully selected from Magma and MoonLight benchmarks. The comprehensive experimental dataset contained 8 classes of 23 CVE vulnerabilities, representing a diverse range of real-world vulnerability scenarios. The experimental results consistently demonstrate that the proposed method successfully avoids the over-clustering problem while achieving high clustering accuracy across various vulnerability types, with F1-measure of 99.99%, 99.69%, and 97.16% on the Poppler, SoX, and LibTIFF datasets.The experimental results consistently demonstrate that the proposed method successfully avoids the over-clustering problem while maintaining high clustering accuracy. In terms of computational performance, the CPU time consumed by this method is remarkably reduced to only one-thousandth of existing approaches, which represents a substantial improvement in efficiency and makes it particularly well-suited for large-scale software projects.
Key words:  crash deduplication  root-cause of vulnerabilities  symbolic constraints  software vulnerabilities