引用本文
  • 钟文康,王添,张功萱.基于组件分割的钓鱼URL检测方法[J].信息安全学报,已采用    [点击复制]
  • zhongwenkang,wangtian,zhanggongxuan.Phishing URL Detection Method Based On Component Segmentation[J].Journal of Cyber Security,Accept   [点击复制]
【打印本页】 【下载PDF全文】 查看/发表评论下载PDF阅读器关闭

过刊浏览    高级检索

本文已被:浏览 4816次   下载 0  
基于组件分割的钓鱼URL检测方法
0
(1.南京理工大学网络空间安全学院;2.江苏财会职业学院信息工程学院;3.南京理工大学计算机科学与工程学院)
摘要:
URL作为钓鱼网站最直接也是最重要的特征,利用深度学习的方法对分词后的URL字符序列进行特征提取,可以极大的提升基于URL的钓鱼网站识别的准确率。将URL按照不同组件进行分割是URL常见的分词手段,该方法能够对不同组件进行多粒度的特征判别,但是这一方法未能在钓鱼网站的URL检测中得到有效应用,尚缺乏深入的研究。此外,现有的基于深度学习的钓鱼网站URL检测方法由于实验数据以及模型训练方法上的局限性,在泛化能力和误报率方面仍存在不足,难以满足真实环境中复杂的识别需求。为解决上述问题,本文提出了一种基于组件分割的钓鱼网站检测方法:(1)该方法首先对URL的不同组件进行分割,并对各组件依次进行字符级分词、截断填充及编码,使得深度学习模型能够对不同组件采取不同层级的管理从而进行细粒度的特征判别。(2)为了避免卷积神经网络中采用的池化策略过于关注局部特征而忽视特征整体空间结构的问题,本文所提方法将对融合后的各组件特征利用胶囊网络进一步提取。(3)在模型训练方法中引入对抗训练机制,对多嵌入层进行独立对抗训练,以满足模型对各组件的差异化处理,从而进一步提升模型的泛化能力。最后,在百万级的样本数据集中,与现有的最先进的同类方法相比,我们方法在钓鱼URL的识别准确率上提升0.86%,误报率降低1.08%,F1-Score提升0.95%。
关键词:  钓鱼URL检测  胶囊网络  对抗训练  数据处理  深度学习
DOI:
投稿时间:2023-02-27修订日期:2023-04-25
基金项目:国家自然科学基金项目(面上项目,重点项目,重大项目)
Phishing URL Detection Method Based On Component Segmentation
zhongwenkang1, wangtian2,3,3,3,3,3,3,4, zhanggongxuan5
(1.School of Cyber Science an d Engineering, Nanjing University of Science an d Technology;2.School of Information Engineering, Jiangsu College of Finance &3.amp;4.Accounting;5.School of Computer Science an d Engineering, Nanjing University of Science an d Technology)
Abstract:
As the most direct, important feature of phishing websites, feature extraction of URL character sequences after word segmentation can improve the accuracy of URL-based phishing detection using deep learning methods. Segmenta-tion of URLs by components is a commonly used URL processing method that enables models to discriminate be-tween the components at different granularities while this method has not been used in phishing URL detection, and the effectiveness of this processing still needs to be independently experimentally demonstrated. Due to the limita-tions of experimental data and model training methods, existing deep learning-based phishing URL detection meth-ods still have shortcomings in terms of generalization ability and false alarm rate, which are challenging to meet the complex needs of real-world environments. To solve the above problems, this paper proposes a component-based segmentation method for phishing website detection: (1) We first segment the URLs into different components, we then perform character-level word separation, truncation filling, and coding for each component so that the deep learning model can adopt different degrees of strict and fine-grained feature discrimination for different compo-nents. (2) To avoid the pooling strategy used in convolutional neural networks (CNNs), which focuses on local fea-tures and ignores the overall spatial structure of the features, the proposed method uses a capsule network (CapsNet) to extract the fused features of each component further. (3) The adversarial training mechanism is introduced in the model training method to conduct independent adversarial training for multiple embedding layers to satisfy the dif-ferentiation of the model for each component, further enhancing the generalisation capability of the model. Through extensive simulations, the result shows a 0.86% improvement in accuracy, 1.08% reduction in false alarm rate, and 0.95% improvement in F1-Score compared to existing state-of-the-art methods in a dataset of millions of samples.
Key words:  Phishing URL detection  capsule networks  adversarial training  data processing  deep learning