融合数据溯源图与网络流量的主机威胁检测

蒋哲宇; 王之梁; 汪明; 栗维勋; 李新鹏; 郑锋

引用本文：

蒋哲宇,王之梁,汪明,栗维勋,李新鹏,郑锋.融合数据溯源图与网络流量的主机威胁检测[J].信息安全学报,已采用 [点击复制]
JIANG Zheyu,WANG Zhiliang,WANG Ming,LI Weixun,LI Xinpeng,ZHENG Feng.Host Threat Detection by Integrating Data Provenance Graphs with Network Traffic[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 228次下载 0次
融合数据溯源图与网络流量的主机威胁检测
蒋哲宇¹, 王之梁¹, 汪明², 栗维勋³, 李新鹏², 郑锋³
0 字体:加大+\|默认\|缩小-
(1.清华大学;2.国家电网有限公司;3.国网河北省电力有限公司)

摘要:

主机威胁是指针对主机的攻击，如病毒、木马、蠕虫和恶意软件等，会对电力系统的信息安全造成重大影响。目前的主机威胁检测研究中，使用的主流方法是基于主机产生的日志构建数据溯源图（下简称溯源图）进行检测。在主机威胁检测的语境中，溯源图是指以主机的进程、文件、套接字等系统实体作为节点，事件为有向边构建的有向无环图。溯源图最初被应用在主机威胁检测上时，对通过网络与待检测主机相连的远程主机进行了简化处理，在此基础上开展的现有研究大多偏重于主机内部实体之间的互动，而忽视了网络流量数据的重要性。此外，目前的很多研究都依赖主机威胁事件样本或专家知识检测威胁。如何在缺乏这二者的条件下，有效应对不断出现的未知攻击也是一个亟待解决的问题。为解决这两个问题，本文将网络流量数据与主机数据相结合，提出了结合主机日志和网络流量的主机威胁检测方法（Traffic Flow Provenance threat detector，TFProv）。TFProv使用异质溯源图和零正样本学习方法实现对未知威胁的检测。本文从大型公开数据集中获取了三台遭受不同主机威胁的主机的数据，在其上进行了测试，并与当前发表的最优方法进行了对比。实验证明，本文的方法在三台主机上的平均 F1-score 达到了 0.978，与已有方法相比有效提升了主机威胁检测能力。

关键词: 主机威胁检测恶意流量分析数据溯源图图神经网络

DOI：

投稿时间：2025-04-25修订日期：2025-09-03

基金项目:

Host Threat Detection by Integrating Data Provenance Graphs with Network Traffic

JIANG Zheyu¹, WANG Zhiliang¹, WANG Ming², LI Weixun³, LI Xinpeng², ZHENG Feng³

(1.Tsinghua University;2.State Grid Corporation of China;3.State Grid Hebei Electric Power Co., Ltd)

Abstract:

Host threats refer to attacks targeting hosts, such as viruses, trojans, worms, and malware, which can have a significant impact on the information security of Electric Power System. Currently, the mainstream approach in host threat detection research is to construct data provenance graphs (referred to as provenance graphs) based on host-generated logs for detection. In the context of host threat detection, a provenance graph refers to a directed acyclic graph constructed with system entities such as processes, files, and sockets of the protected host as nodes, and events as directed edges. When provenance graphs were initially applied to host threat detection, remote hosts connected to the target host via the network were simplified. Subsequent studies have mostly focused on the interactions among internal entities of the host while neglecting the importance of network traffic data. Moreover, many current studies rely on host threat event samples or expert knowledge for threat detection. How to effectively address emerging unknown attacks in the absence of these two factors is also an urgent issue that needs to be solved. To address these two issues, this paper combines network traffic data with host data and proposes a host threat detection system called Traffic Flow Provenance threat detector (TFProv). TFProv utilize heterogeneous provenance graph and zero-positive-learning to enable the detection of unknown threats. Experimental tests were conducted on data from three hosts subjected to different host threats, obtained from large public datasets, and compared with state-of-the-art methods. The results demonstrate that the proposed method achieved an average F1-score of 0.978 on the three hosts, consistently outperforming the state-of-the-art approach.

Key words: Host-based threat detection malware traffic analysis, data provenance graph graph neural network