引用本文: |
-
孟祥帅,欧金浒,邱堃,赵进.基于域名规则自动生成的网络流量应用识别方法[J].信息安全学报,已采用 [点击复制]
- mengxiangshuai,oujinhu,qiukun,zhaojin.Network traffic application identification method based on automatic generation of domain name rules[J].Journal of Cyber Security,Accept [点击复制]
|
|
摘要: |
流量分类是网络安全体系的重要组成部分,其精准性和实时性直接关系到网络的安全防护能力和应对效率。其中,应用识别作为流量分类的重要子任务之一,旨在将网络流量精确映射到具体的应用服务上,对提高网络安全检测的准确性、优化资源分配以及应对恶意流量具有重要意义。近年来,随着人工智能技术的快速发展,诸多研究将机器学习和深度学习方法应用于流量应用识别任务。尽管这些方法在分类精度方面表现优异,但其模型复杂度过高导致的推理延迟问题,极大地限制了它们在需要低延迟的实时场景中的实际部署。为此,本文提出了一种基于域名规则自动生成的网络流量应用识别方法。不同于传统的端口号或深度包检测(DPI)等方法,该方法充分利用网络流量中域名信息作为分类的重要特征。该方法首先对流量数据的域名字段进行分词处理,进而提取结构化的特征用于模型训练。然后它通过分析和量化机器学习模型的可解释性,自动提取并生成一组精确可用于应用识别的域名规则集合,并结合现有的正则表达式匹配工具,实现了在保持高分类准确率的同时显著降低了分类延迟的效果。经过在大量网络数据集上对该方法进行评估,分类延迟显著降低,比机器学习方法降低20-60倍,相比于深度学习方法降低千倍,充分证明了本研究的实际应用潜力。 |
关键词: 流量分类 应用识别 规则自动生成 低延迟 |
DOI: |
投稿时间:2024-12-31修订日期:2025-05-07 |
基金项目: |
|
Network traffic application identification method based on automatic generation of domain name rules |
mengxiangshuai, oujinhu, qiukun, zhaojin
|
(Fudan University) |
Abstract: |
Traffic classification is an important part of the network security system. Its accuracy and real-time performance are directly related to the network's security protection capabilities and response efficiency. Application identification, as one of the important subtasks of traffic classification, aims to accurately map network traffic to specific application services, which is of great significance to improving the accuracy of network security detection, optimizing resource allocation, and responding to malicious traffic. In recent years, with the rapid development of artificial intelligence technology, many studies have applied machine learning and deep learning methods to traffic application identification tasks. Although these methods have excellent performance in classification accuracy, the reasoning delay problem caused by the high complexity of their models greatly limits their actual deployment in real-time scenarios that require low latency. To address this issue, this paper proposes a network traffic application identification method based on the automatic generation of domain name rules. Different from traditional methods such as port numbers or deep packet inspection (DPI), this method makes full use of domain name information in network traffic as an important feature for classification. This method first performs word segmentation on the domain name field of traffic data and then extracts structured features for model training. Then, by analyzing and quantifying the interpretability of the machine learning model, it automatically extracts and generates a set of domain name rules that can be used for application identification, and combines it with existing regular expression matching tools to achieve the effect of significantly reducing classification delay while maintaining high classification accuracy. After evaluating this method on a large number of network datasets, the classification delay was significantly reduced, 20-60 times lower than the machine learning method and a thousand times lower than the deep learning method, fully demonstrating the practical application potential of this research. |
Key words: traffic classification application identification automatic rule generation low latency |