基于启发式奖赏塑形方法的智能化攻击路径发现

曾庆伟; 张国敏; 邢长友; 宋丽华

引用本文：

曾庆伟,张国敏,邢长友,宋丽华.基于启发式奖赏塑形方法的智能化攻击路径发现[J].信息安全学报,2024,9(3):44-58 [点击复制]
ZENG Qingwei,ZHANG Guomin,XING Changyou,SONG Lihua.Intelligent Attack Path Discovery Based on Heuristic Reward Shaping Method[J].Journal of Cyber Security,2024,9(3):44-58 [点击复制]

本文已被：浏览 8377次下载 5175次	码上扫一扫！
基于启发式奖赏塑形方法的智能化攻击路径发现
曾庆伟, 张国敏, 邢长友, 宋丽华
0 字体:加大+\|默认\|缩小-
(陆军工程大学指挥控制工程学院南京中国 210007)

摘要:

渗透测试作为一种评估网络系统安全性能的重要手段, 是以攻击者的角度模拟真实的网络攻击, 找出网络系统中的脆弱点。而自动化渗透测试则是利用各种智能化方法实现渗透测试过程的自动化, 从而大幅降低渗透测试的成本。攻击路径发现作为自动化渗透测试中的关键技术, 如何快速有效地在网络系统中实现智能化攻击路径发现, 一直受到学术界的广泛关注。现有的自动化渗透测试方法主要基于强化学习框架实现智能化攻击路径发现, 但还存在奖赏稀疏、学习效率低等问题, 导致算法收敛速度慢, 攻击路径发现难以满足渗透测试的高时效性需求。为此, 提出一种基于势能的启发式奖赏塑形函数的分层强化学习算法(HRL-HRSF), 该算法首先利用渗透测试的特性, 根据网络攻击的先验知识提出了一种基于深度横向渗透的启发式方法, 并利用该启发式方法设计出基于势能的启发式奖赏塑形函数, 以此为智能体前期探索提供正向反馈, 有效缓解了奖赏稀疏的问题;然后将该塑形函数与分层强化学习算法相结合, 不仅能够有效减少环境状态空间与动作空间大小, 还能大幅度提高智能体在攻击路径发现过程中的奖赏反馈, 加快智能体的学习效率。实验结果表明, HRL-HRSF 相较于没有奖赏塑形的分层强化学习算法、DQN 及其改进算法更加快速有效, 并且随着网络规模和主机漏洞数目的增大, HRL-HRSF 均能保持更好地学习效率, 拥有良好的鲁棒性和泛化性。

关键词: 自动化渗透测试奖赏塑形分层强化学习攻击路径发现 DQN算法

DOI：10.19363/J.cnki.cn10-1380/tn.2024.05.04

投稿时间：2022-07-31修订日期：2022-11-26

基金项目:本课题得到国家自然科学基金(No. 62172432)资助。

Intelligent Attack Path Discovery Based on Heuristic Reward Shaping Method

ZENG Qingwei, ZHANG Guomin, XING Changyou, SONG Lihua

(College of Command and Control Engineering, Army Engineering University, Nanjing 210007, China)

Abstract:

As an important means to evaluate the security performance of network systems, penetration testing is to simulate real network attacks from the perspective of attackers and find out the vulnerable points in network systems. The automatic penetration test uses various intelligent methods to realize the automation of the penetration test process, thus greatly reducing the cost of penetration test. Attack path discovery is a key technology in automated penetration testing. how to quickly and effectively implement intelligent attack path discovery in network systems has been widely concerned by the academic community. The existing automated penetration testing methods are mainly based on the reinforcement learning framework to achieve intelligent attack path discovery. However, there are still problems such as sparse rewards and low learning efficiency, which lead to slow convergence of the algorithm. Attack path discovery cannot meet the high timeliness requirements of penetration testing. Therefore, a layered reinforcement learning algorithm (HRL-HRSF) based on potential energy heuristic reward shaping function is proposed. This algorithm first uses the characteristics of penetration testing to propose a heuristic method based on depth horizontal penetration according to the prior knowledge of network attacks, and uses this heuristic method to design a potential energy based heuristic reward shaping function to provide positive feedback for early exploration of agents, it effectively alleviates the problem of sparse rewards. Then combining the shaping function with hierarchical reinforcement learning algorithm can not only effectively reduce the size of environment state space and action space, but also greatly improve the reward feedback of agents in the process of attack path discovery, and accelerate the learning efficiency of agents. The experimental results show that HRL-HRSF is faster and more effective than layered reinforcement learning algorithm without reward shaping, DQN and its improved algorithm. With the increase of network size and host vulnerabilities, HRL-HRSF can maintain better learning efficiency, has good robustness and generalization.

Key words: automated penetration testing reward shaping hierarchical reinforcement learning attack path discovery DQN algorithm