基于深度强化学习的网络攻击路径规划方法

高文龙; 周天阳; 赵子恒; 朱俊虎

引用本文：

高文龙,周天阳,赵子恒,朱俊虎.基于深度强化学习的网络攻击路径规划方法[J].信息安全学报,2022,7(5):65-78 [点击复制]
GAO Wenlong,ZHOU Tianyang,ZHAO Ziheng,ZHU Junhu.Network Attack Path Planning Method based on Deep Reinforcement Learning[J].Journal of Cyber Security,2022,7(5):65-78 [点击复制]

本文已被：浏览 10187次下载 8108次	码上扫一扫！
基于深度强化学习的网络攻击路径规划方法
高文龙, 周天阳, 赵子恒, 朱俊虎
0 字体:加大+\|默认\|缩小-
(信息工程大学郑州中国 450001)

摘要:

攻击路径规划对实现自动化渗透测试具有重要意义，在现实环境中攻击者很难获取全面准确的网络及配置信息，面向未知渗透测试环境下的攻击路径规划，提出了基于深度强化学习的攻击路径规划方法。首先，对渗透测试问题的状态空间和动作空间进行形式化描述，引入信息收集动作增强对环境的感知能力。然后，智能体通过与环境的自主交互进行学习，寻找最大化长期收益的最优策略，从而指导攻击者进行路径规划。当前深度强化学习算法应用于攻击路径规划存在适应性不强和收敛困难等问题，限制了其处理复杂渗透测试环境的能力。智能体在训练初期通过盲目探索得到的动作序列在维度迅速增长时质量会急剧下降，有时很难完成目标，而且低质量的动作序列大量积累会导致算法不收敛甚至神经元死亡。针对此问题，本文提出的深度强化学习算法在DDQN算法的基础上增加了路径启发信息和深度优先渗透的动作选择策略。路径启发信息充分利用历史经验，在训练初期对智能体的学习过程加以引导，深度优先渗透的动作选择策略在一定程度上对动作空间进行了剪枝，加速智能体的学习过程。最后，通过与其他深度强化学习算法在相同实验条件下的对比，验证了本文算法收敛速度更快，运行时间缩短30%以上。

关键词: 深度强化学习路径启发信息深度优先渗透的动作选择策略攻击路径规划

DOI：10.19363/J.cnki.cn10-1380/tn.2022.09.06

投稿时间：2021-09-23修订日期：2021-11-22

基金项目:

Network Attack Path Planning Method based on Deep Reinforcement Learning

GAO Wenlong, ZHOU Tianyang, ZHAO Ziheng, ZHU Junhu

(Information Engineering University, Zhengzhou 450001, China)

Abstract:

Attack path planning is of great significance to the realization of automated penetration testing. In a real environment, it is difficult for an attacker to obtain comprehensive and accurate network and configuration information. For network attack path planning in unknown penetration test environment, a path planning method based on deep reinforcement learning is proposed. First, the state space and action space of the penetration test problem are formally described, and the information collection action is introduced to enhance the perception of the environment. Then, the agent learns through autonomous interaction with the environment to find the optimal strategy for maximizing long-term benefits, so as to guide the attacker to plan the path. The current deep reinforcement learning algorithm applied to attack path planning has problems such as poor adaptability and difficulty in convergence, which limit its ability to handle complex penetration testing environments. The quality of the action sequence obtained by the agent through blind exploration in the initial training stage will drop sharply when the dimensionality increases rapidly, and sometimes it is difficult to complete the goal. Moreover, the large accumulation of low-quality action sequences will cause the algorithm to fail to converge and even neuron death. In response to this problem, the deep reinforcement learning algorithm proposed in this paper adds path-heuristic information and the action selection strategy of depth-first penetration on the basis of the DDQN algorithm. Path-heuristic information makes full use of historical experience to guide the learning process of the agent in the early stage of training. The action selection strategy of depth-first penetration prunes the action space to a certain extent, accelerating the learning process of the agent. Finally, through comparison with other deep reinforcement learning algorithms under the same experimental conditions, it is verified that the algorithm in this paper converges faster and the running time is shortened by more than 30%.

Key words: deep reinforcement learning path-heuristic information the action selection strategy of depth-first penetration attack path planning