|
|
|
本文已被:浏览 3650次 下载 2024次 |
码上扫一扫! |
DataCon:面向安全研究的多领域大规模竞赛开放数据 |
郑晓峰,段海新,陈震宇,应凌云,何直泽,汤舒俊,郑恩南,刘保君,陆超逸,沈凯文,张甲,陈卓,林子翔 |
|
(清华大学网络科学与网络空间研究院 北京 中国 100084;奇安信科技集团 北京 中国 100088) |
|
摘要: |
网络安全数据是开展网络安全研究、教学的重要基础资源,尤其基于实战场景下的安全数据更是科研教学成果更符合安全实践的保障。然而,由于网络安全的技术变化快、细分领域多、数据敏感等原因,寻找合适的网络安全数据一直是研究者们进行科研和老师开展实践教学时关注的重要问题。本文总结并分析了多个领域的经典公开安全数据集,发现其在研究应用时存在数据旧、规模小、危害大等不足;克服安全数据领域选择、大规模实战数据获取、安全隐私开放等困难,构造了更符合当前科研需求DataCon安全数据集。数据集大规模覆盖DNS、恶意软件、加密恶意流量、僵尸网络、网络黑产等多个领域,且均来自实战化场景,并基于DataCon竞赛平台将其开放给参赛者和科研人员。目前,DataCon数据集涵盖了已成功举办四届的“DataCon大数据安全分析大赛”的全部数据,大赛被国家教育部评为优秀案例,并进入多所高校研究生加分名单,数据内容也一直随着真实网络环境中攻防场景的变化而持续更新。目前,DataCon数据集涵盖了已成功举办四届的“DataCon大数据安全分析大赛”的全部数据,大赛被国家教育部评为优秀案例,并进入多所高校研究生加分名单,数据内容也一直随着真实网络环境中攻防场景的变化而持续更新。数据集持续收到科研人员、学术的数据使用申请,支撑了多篇学术论文的发表,充分说明了其有效性和可用性。我们希望DataCon数据及竞赛能够对网络安全领域产、学、研结合有所帮助和促进。 |
关键词: DataCon 安全研究 开放数据 竞赛 |
DOI:10.19363/J.cnki.cn10-1380/tn.2024.01.09 |
Received:April 26, 2021Revised:November 04, 2022 |
基金项目:本课题得到国家自然科学基金课题资助(No. U1836213, No. U19B2034)。 |
|
DataCon: Open Dataset for Large-scale Multiple Fields Security Research and Competitions |
ZHENG Xiaofeng,DUAN Haixin,CHEN Zhenyu,YING Lingyun,HE Zhize,TANG Shujun,ZHENG Ennan,LIU Baojun,LU Chaoyi,SHEN Kaiwen,ZHANG Jia,CHEN Zhuo,LIN Zixiang |
Institute of Network Science and Cyberspace, Tsinghua University, Beijing 100084, China;QI-ANXIN Technology Group Inc, Beijing 100088, China |
Abstract: |
Cyber security data is an essential resource for cyber security research and teaching, especially the security data based on real-world scenarios is a guarantee that research and teaching results are more consistent with security practices. However, due the rapid changing technology, multiple sub-fields, data sensitivity and other reasons in the field of cyber security, finding the appropriate data has been the essential concern of researchers conducting their research and teachers teaching practice. In this paper, we summarize and analyze the classical public security data in several fields, and find that there are deficiencies in research and teaching applications such as outdated data, small data set size, and large security hazards, and overcome the difficulties of security data fields selected, large-scale real-world data acquisition, and security privacy openness, and construct DataCon security data set that is more suitable for current research needs. The DataCon data set covers DNS, malware, encrypted, malicious traffic, botnet, underground industry data set and other fields on a large scale and all from real-world scenarios, and it is open to the participants and researchers based on the DataCon competition platform. At present, the DataCon dataset covers all the data of “DataCon Big Data Security Analysis Competition”, which has been held successfully for four years, and the competition has been evaluated as an excellent case by the Ministry of Education of the People's Republic of China and has been entered into the list of extra points for graduate school of many colleges and universities, and the data content has been continuously updated along with the changes of attack and defense scenarios in the real network environment. The dataset continues to receive data use applications from scientific researchers and academics, supporting the publication of multiple academic papers, fully demonstrating its effectiveness and usability. We hope that the DataCon data and competition can help and promote the combination of industry, academia and research in the field of cybersecurity. |
Key words: DataCon security search open dataset competitions |
|
|
|
|
|
|