引用本文
  • 王旭仁,魏欣欣,王媛媛,姜政伟,江钧,杨沛安,刘润时.网络威胁情报实体识别研究综述[J].信息安全学报,已采用    [点击复制]
  • wangxuren,weixinxin,wangyuanyuan,jiangzhengwei,jiangjun,yangpeian,liurunshi.A Survey of Cyber Threat Intelligence Entity Recognition Research[J].Journal of Cyber Security,Accept   [点击复制]
【打印本页】 【下载PDF全文】 查看/发表评论下载PDF阅读器关闭

过刊浏览    高级检索

本文已被:浏览 1453次   下载 0  
网络威胁情报实体识别研究综述
王旭仁1, 魏欣欣1, 王媛媛2, 姜政伟3, 江钧3, 杨沛安3, 刘润时1
0
(1.首都师范大学信息工程学院;2.31005部队;3.中国科学院信息工程研究所 中国科学院网络测评技术重点实验室)
摘要:
由于网络环境愈发复杂,网络安全形势日渐严峻,保护网络免受外来攻击成为一项重要的工作。为了使网络空间攻防技术变为主动防御的形式,网络威胁情报应运而生。通过对网络威胁情报进行分析和检测,搜集情报证据,能够预防攻击行为的发生。因此,通过共享网络威胁情报来抵御网络攻击变得愈发重要。然而,网络威胁情报通常以非结构化的形式共享,将其转化为半结构化或结构化数据对后续很多任务来讲尤为重要,命名实体识别技术能够实现这一点。虽然在通用领域的命名实体识别已经取得了非常不错的成果,但在网络威胁情报领域却仍然存在很多问题。本文首先介绍威胁情报相关背景,及其与命名实体识别之间的联系。然后根据命名实体识别技术发展的时间顺序总结基于规则和词典的实体识别技术、基于无监督学习的实体识别技术、基于特征的监督学习实体识别技术、基于深度学习的实体识别技术等,全面总结威胁情报领域命名实体识别的研究现状和未来的发展方向。最后,对比研究威胁情报领域命名实体识别所使用的语料库,使用SOTA深度学习方法进行实验,分析总结出威胁情报领域数据集所存在的问题。提出的BBC(BERT-BiGRU-CRF) 深度学习实体识别模型具有最好的实验效果,在AutoLabel数据集、DNRTI数据集、CTIReports数据集,以及APTNER数据集上分别达到97.36%、90.40%、82.87%、73.91%的F1值。
关键词:  命名实体识别  网络威胁情报  深度学习  网络威胁情报数据集
DOI:
投稿时间:2023-02-04修订日期:2023-05-15
基金项目:
A Survey of Cyber Threat Intelligence Entity Recognition Research
wangxuren1, weixinxin1, wangyuanyuan2, jiangzhengwei3, jiangjun3, yangpeian3, liurunshi1
(1.Information Engineering College, Capital Normal University;2.Unit 31005 of PLA;3.Key Laboratory of Network Assessment Technology, Institute of Information Engineering, Chinese Academy of Sciences)
Abstract:
As the network environment is becoming more and more complex and the cyber security situation is becoming more and more serious, protecting the network from external attacks has become an important work. In order to make cyber attack prevention technology into a form of active defense, cyber threat intelligence emerges in recent years, with the collection, analysis and detection of which attacks can be prevented. Therefore, it becomes more and more important to defend against cyber attacks by cyber threat intelligence sharing. However, cyber threat intelligence is usually shared in an un-structured form, and transforming it into semi-structured or structured data is particularly important for many subsequent tasks, which can be achieved by named entity recognition technology. Although named entity recognition in the general field has achieved very good results, there are still many problems in the field of cyber threat intelligence. First, this paper introduces the background of threat intelligence and its relationship with named entity recognition. Then, according to the chronological order of the development of named entity recognition technology, we summarize rule-based and dic-tionary-based entity recognition technology, unsupervised learning-based entity recognition technology, feature-based supervised learning entity recognition technology, deep learning-based entity recognition technology, and so on. Com-prehensively summarize the research status and future development direction of named entity recognition in the field of threat intelligence. Finally, compare and study the corpus used in named entity recognition in the field of threat intelli-gence, use the SOTA deep learning method to conduct experiments, analyze and summarize the problems existing in the data set in the field of threat intelligence. The proposed BBC (BERT-BiGRU-CRF) deep learning entity recognition mod-el has the best experimental results, the F1 values of 97.36%, 90.40%, 82.87% and 73.91% on AutoLabel dataset, DNRTI dataset, CTIReports dataset and APTNER dataset respectively.
Key words:  Named Entity Recognition  Cyber Threat Intelligence (CTI)  Deep Learning  CTI Datasets