摘要: |
模糊测试是一种高效的软件漏洞发现技术,在学术界和工业界有着丰富的研究成果和广泛的实践应用,产生了许多模糊测试工具。这些工具在技术特点及性能方面有着明显各异,需要通过测试来评估其效能,从而为工具选用以及改进提供指导。然而现有的模糊测试工具测评方法普遍存在一些情况下测评结果无法解释的问题。我们发现这与现有测评普遍忽略了模糊测试妨碍特征(Fuzzing-hampering Feature)有关。对此,本文深入研究妨碍特征对模糊测试的影响,归纳、提炼出5种妨碍特征,提出了一种将妨碍特征作为控制变量的、细粒度对比测评方法,并运用代码合成技术构建了包含118个目标程序的测试集Bench4I。经过对6款不同模糊测试工具的测评,结果表明,运用该方法可准确解释目标程序样本对被测工具功效的影响,进而推断工具的具体能力,有效提升了测评的可解释性。本文根据测评结果对实验中的被测工具提出了使用与改进建议,并实践了对QSYM的改进,取得了良好的效果。 |
关键词: 模糊测试 测评 测试集 软件漏洞 |
DOI:10.19363/J.cnki.cn10-1380/tn.2022.12.11 |
Received:September 04, 2020Revised:November 10, 2020 |
基金项目:本课题得到中国科学院网络测评技术重点实验室和网络安全防护技术北京市重点实验室资助;国家自然科学基金重点项目(No.62032010)资助;国家自然科学基金重点项目(No.U1836209)资助。 |
|
Evaluating Fuzzers Based on Fuzzing-hampering Features |
HAO Gaojian,LI Feng,HUO Wei,ZOU Wei |
Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;Key Laboratory of Network Assessment Technology, Chinese Academy of Sciences, Beijing 100195, China;Beijing Key Laboratory of Network Security and Protection Technology, Beijing 100195, China;University of Chinese Academy of Sciences, Beijing 100049, China |
Abstract: |
Fuzz testing is an efficient method to find security critical bugs. In recent years, a plenty of works about fuzz testing have been proposed in both industry and academia. A variety of fuzz testing tools have been developed. These tools differ in techniques and performance so that the evaluation of fuzzers is demanded to understand these tools. But many existing evaluations have problems of bad interpretability, which leads to limited findings from the evaluation results. In this paper, we find that the evaluation results can be affected by plenty of factors, including fuzzing-hampering features contained in the target programs. However, existing evaluations pay little attention on fuzzing-hampering features, which leads to the inability to explain the reasons behind the evaluation results, even causing unclear or erroneous conclusions. In this regard, we propose a method to evaluate fuzzers based on fuzzing-hampering features. Our method treats fuzzing-hampering features as one of the controlled variables and performs fine-grained comparative testing to find out the relationships between evaluation results and fuzzing-testing features to identify the reason causing the different results, making the evaluation more interpretable. We also develop a method to construct benchmarks with which fuzzing-hampering features can be a controlled variable during the evaluation. To implement the idea and show its effectiveness, we summarized 5 fuzzing-testing features, quantitatively defined how to calculate the indicator of the capabilities of a fuzzer and constructed a bug benchmark named Bench4I, which included 118 synthetic programs with different fuzzing-hampering features. In the experiment, we evaluated 6 fuzzers. It shows that the tools’ detailed capabilities can be inferred according to the indicators calculated from the evaluation results so that and the evaluation results become more interpretable. With the help of the evaluation, we also proposed several advices of using and improving these fuzzers. We put the improvement of QSYM into practice and gained a quite encouraging result. |
Key words: fuzz testing evaluation benchmark security critical bug |