基于不定长卷积神经网络的恶意流量分类算法

杨璇; 邬江兴; 赵博

本文已被：浏览 10748次下载 8452次	码上扫一扫！
基于不定长卷积神经网络的恶意流量分类算法
杨璇,邬江兴,赵博
分享到：微信更多字体:加大+\|默认\|缩小-
(东南大学网络空间安全学院南京中国 211189;国家数字交换系统工程技术研究中心郑州中国 450002;中国人民解放军战略支援部队信息工程大学郑州中国 450001)

摘要:

在当今信息爆炸、网络快速发展的时代,网络攻击与网络威胁日益增多,恶意流量识别在网络安全中发挥着非常重要的作用。深度学习在图像处理、自然语言处理上已经展现出优越的性能,因此有诸多研究将深度学习应用于流量分类中。将深度学习应用于流量识别时,部分研究对原始流量数据进行截断或者补零操作,截断操作容易造成流量信息的部分丢失,补零操作容易引入对模型训练无用的信息。针对这一问题,本文提出了一种用于恶意流量分类的不定长输入卷积神经网络(IndefiniteLength Convolutional Neural Network,ILCNN),该网络模型基于不定长输入,在输入时使用未截断未补零的原始流量数据,利用池化操作将不定长特征向量转化为定长的特征向量,最终达到对恶意流量分类的目的。基于CICIDS-2017数据集的实验结果表明,ILCNN模型在F1-Score上的分类准确率能够达到0.999208。相较于现有的恶意流量分类工作,本文所提出的不定长输入卷积神经网络ILCNN在F1-Score和准确率上均有所提升。

关键词: 恶意流量流量分类卷积神经网络不定长输入

DOI：10.19363/J.cnki.cn10-1380/tn.2022.07.07

投稿时间：2021-04-30修订日期：2021-08-01

基金项目:

Malicious Traffic Classification Based on Indefinite Length Convolutional Neural Network

YANG Xuan,WU Jiangxing,ZHAO Bo

School of Cyber Science and Engineering, Southeast University, Nanjing 212289, China;China National Digital Switching System Engineering & Technological R&D Center, Zhengzhou 450002, China;Information Engineering University, Zhengzhou 450001, China

Abstract:

In today's era of information explosion and rapid network development, network attacks and network threats are increasing, and malicious traffic identification plays a very important role in network security. And deep learning has shown superior performance in image processing and natural language processing, so there are many researches to apply deep learning to traffic classification. When applying convolutional neural networks to traffic classification, some studies truncate or zero-complement the original traffic data, which may cause partial loss of traffic information and zero-complement operation may introduce information that is not useful for model training, thus affecting the detection accuracy of the model. In this paper, we propose an Indefinite Length Convolutional Neural Network (ILCNN) for malicious traffic classification, which is based on indefinite length input, and uses the raw traffic data without truncation and zero filling in the input, and uses the pooling operation to transform the indefinite length. This network model is based on indeterminate length input, using untruncated and un-zeroed raw traffic data in the input, and using pooling operation to transform indeterminate length feature vectors into fixed length feature vectors for the purpose of classifying malicious traffic. Because ILCNN uses the original traffic data and retains all the information of the traffic data, it can better perform feature extraction in the training phase of the model, avoiding the impact of losing some traffic information and introducing useless information, and eliminating the need for manual feature extraction and the tedious process of feature extraction of the traffic data; multiple convolutional kernels of different sizes are used in the model, which can extract the traffic classification. The model uses multiple convolution kernels of different sizes to extract features from different fields of view of the traffic data, which is convenient for subsequent classification of malicious traffic. The experimental results based on the CICIDS-2017 dataset show that the classification accuracy of the ILCNN model on F1-Score can reach 0.999208. Compared with the existing work on malicious traffic classification, the proposed ILCNN with indefinite long input convolutional neural network improves on both F1-Score and accuracy.

Key words: malicious traffic traffic classification convolutional neural network variable length input