摘要: |
入侵检测系统在检测和预防各种网络异常行为的过程中,海量和高维的流量数据使其面临着低准确率和高误报率的问题。本文提出一种基于流量异常分析多维优化的入侵检测方法,该方法在入侵检测数据的横向维度和纵向维度两个维度进行优化。在横向维度优化中,对数量较多的类别进行数据抽样,并采用遗传算法得到每个类别的最佳抽样比例参数,完成数据的均衡化。在纵向维度优化中,结合特征与类别的相关分析,采用递归特征添加算法选择特征,并提出平均召回率指标评估特征选择效果,实现训练集的低维高效性。基于优化的入侵检测数据,进一步通过训练数据集得到随机森林分类器,在真实数据集UNSW_NB15评估和验证本文提出的算法。与其他算法相比,本文算法具有高准确率和低误报率,并在攻击类型上取得了有效的召回率。 |
关键词: 入侵检测框架 多维优化 数据抽样 递归特征添加算法 遗传算法参数优化 随机森林 |
DOI:10.19363/J.cnki.cn10-1380/tn.2019.01.02 |
投稿时间:2018-09-30修订日期:2018-11-24 |
基金项目:本课题得到国家重点研发计划项目(No.2016YFB0800700);国家自然科学基金(No.61472341,No.61772449,No.61572420,No.61807028,No.61802332);河北省自然科学基金(No.F2016203330);博士后科研择优资助项目(No.B2017003005)资助 |
|
An intrusion detection method based on multi-dimensional optimization of traffic anomaly analysis |
LIU Xinqian,SHAN Chun,REN Jiadong,WANG Qian,GUO Jiawei |
Department of Information Science and Engineering, Yanshan University, Qinhuangdao 066001, China;Hebei Key Laboratory of Software Engineering, Qinhuangdao 066001, China;Beijing Institute of Technology, Beijing 100081, China;Beijing Key Laboratory of Software Security Engineering Technology, Beijing 100081, China |
Abstract: |
In the process of detecting and preventing various network anomaly behaviors, intrusion detection system is facing the problem of low accuracy and high false alarm rate due to the massive and high-dimensional traffic data. An intrusion detection method based on multi-dimensional optimization of traffic anomaly analysis is proposed, in which both horizontal and vertical dimensions of intrusion detection dataset are optimized. In horizontal dimensions optimization, those categories with a large number are sampled and the optimal sampling proportion parameters of each category are obtained by genetic algorithm. Data equalization is accomplished. In vertical dimensions optimization, combining with the correlation analysis of features with label, recursive features addition algorithm is adopted to select features, and the average recall is proposed to evaluate the effect of features selection. The low-dimensional and high-efficient training data set is achieved. Based on optimized intrusion detection dataset, the random forest classifier is obtained by training dataset, and the real data set UNSW_NB15 is used to evaluate and validate the proposed method. Compared with other algorithms, the proposed algorithm has high accuracy and low false alarm rate, and effective recall rate on attack category is obtained. |
Key words: intrusion detection framework multi-dimensional optimization data sampling recursive features addition genetic algorithm parameter optimization random forest |