HTTPFuzzer：强化学习引导的Web服务器灰盒模糊测试方法

陈乾; 洪征; 江川; 张国敏; 秦素娟; 古津榜; 崔帅

引用本文：

陈乾,洪征,江川,张国敏,秦素娟,古津榜,崔帅.HTTPFuzzer：强化学习引导的Web服务器灰盒模糊测试方法[J].信息安全学报,已采用 [点击复制]
ChenQian,HongZheng,JiangChuan,ZhangGuomin,QinSujuan,GuJinbang,CuiShuai.HTTPFuzzer：Reinforcement Learning Guided Greybox Fuzzing for Web Servers[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 709次下载 0次
HTTPFuzzer：强化学习引导的Web服务器灰盒模糊测试方法
陈乾, 洪征, 江川, 张国敏, 秦素娟, 古津榜, 崔帅
0 字体:加大+\|默认\|缩小-
(中国人民解放军陆军工程大学指挥控制工程学院)

摘要:

Web服务器作为互联网架构的核心组成部分，承载着海量的数据交互与业务逻辑处理，其安全性直接关系到整个信息系统的稳固。Web服务器程序作为关键组件，受到了攻击者的广泛关注。因此，对Web服务器程序的漏洞进行挖掘和修复至关重要。基于变异的灰盒模糊测试方法广泛应用于Web服务器程序的漏洞挖掘中，此类方法通常以真实的HTTP报文为“种子”，对“种子”进行变异以产生测试用例。测试用例的质量取决于“种子”变异位置的选择和变异算子的调度。现有方法在变异位置的选择和变异算子的调度方面主要遵循预设规则，盲目地遵循预设规则使变异缺乏针对性，效率较低，导致大量无效用例的产生，降低了模糊测试效率。针对上述问题，提出了HTTPFuzzer，一种强化学习引导的Web服务器程序灰盒模糊测试方法。HTTPFuzzer对种子的变异分为两个阶段。第一个阶段是变异位置探索阶段，首先对种子进行分段，将对各个段的探索转化成多臂老虎机问题。在对段的探索过程中，依据代码覆盖率和测试目标的反馈报文，筛选出使测试目标正常响应又能提高代码覆盖率的段作为变异域。第二个阶段是基于Q学习的变异算子调度阶段，这个阶段以提升代码覆盖率为目标，动态调整变异算子调度策略，使模糊器根据不同的变异域选择较优的变异算子实施变异。实验表明，HTTPFuzzer具备产生高质量测试用例的能力，对比基准方法能够在较短时间内覆盖更多的程序路径，并且能够较为快速地触发崩溃，发现漏洞。

关键词: 网络安全强化学习 Web服务器协议模糊测试变异算子

DOI：

投稿时间：2024-09-03修订日期：2024-12-09

基金项目:国家重点研发计划（2019YFB2101704）

HTTPFuzzer：Reinforcement Learning Guided Greybox Fuzzing for Web Servers

ChenQian, HongZheng, JiangChuan, ZhangGuomin, QinSujuan, GuJinbang, CuiShuai

(Command and Control Engineering College, Army Engineering University of PLA)

Abstract:

In the realm of Internet architecture, Web servers function as pivotal elements, facilitating extensive data interchange and intricate business logic execution. The security posture of these servers is intimately tied to the overall stability and resilience of the information systems. Web server programs, as critical components, have received extensive attention from attackers. Consequently, the significance of diligently exploring and promptly remediating vulnerabilities within Web server programs cannot be overstated. The mutation-based greybox fuzzing method is widely used in vulnerability mining of Web server programs. This kind of method usually takes real HTTP messages as "seeds" and mutates the "seeds" to generate test cases. The quality of the test cases depends on the selection of the mutation locations and the scheduling of the mutation operators. Existing methods mainly follow the preset rules in the selection of mutation loca-tion and the scheduling of mutation operators. Blindly following the preset rules renders the mutation less targeted and less efficient, resulting in many invalid test cases, which affects the efficiency of fuzzing. To address the aforementioned problems, HTTPFuzzer, a reinforcement learning guided greybox fuzzing method for Web server programs is proposed. HTTPFuzzer's mutation process is partitioned into two stages. The first stage is the mutation location exploration, where the seeds are segmented first, and the exploration of each segment is transformed into a multi-armed bandit machine problem. During the segment exploration process, according to the code coverage and the responses of the test target, the segments that can make the test target respond correctly and improve the code coverage is selected as the muta-tion-regions. The second stage is the mutation operator scheduling stage based on Q-learning. This stage aims to im-prove the code coverage, dynamically adjusts the mutation operator scheduling strategy, and the fuzzer selects better mutation operators according to different mutation-regions. Experimental results demonstrate that HTTPFuzzer can produce high-quality test cases, surpassing benchmark methods by covering more execution paths within a shorter time frame. Furthermore, it is capable of triggering crashes and uncovering potential vulnerabilities in a shorter time.

Key words: Network security reinforcement learning Web servers protocol fuzzing test fuzzing operators