面向AI模型训练的DNS窃密数据自动生成

冯林; 崔翔; 王忠儒; 甘蕊灵; 刁嘉文; 韩冬旭; 姜海

引用本文：

冯林,崔翔,王忠儒,甘蕊灵,刁嘉文,韩冬旭,姜海.面向AI模型训练的DNS窃密数据自动生成[J].信息安全学报,2021,6(1):1-16 [点击复制]
FENG Lin,CUI Xiang,WANG Zhongru,GAN Ruiling,DIAO Jiawen,HAN Dongxu,JIANG Hai.Automatic Data Generation of DNS-Based Exfiltration for AI-Model Training[J].Journal of Cyber Security,2021,6(1):1-16 [点击复制]

本文已被：浏览 13681次下载 10515次	码上扫一扫！
面向AI模型训练的DNS窃密数据自动生成
冯林¹, 崔翔¹, 王忠儒², 甘蕊灵³, 刁嘉文³, 韩冬旭⁴, 姜海⁵
0 字体:加大+\|默认\|缩小-
(1.广州大学网络空间先进技术研究院广州中国 510006;2.中国网络空间研究院北京中国 100010;3.北京邮电大学网络空间安全学院北京中国 100876;4.中国科学院信息工程研究所北京中国 100093;5.北京丁牛科技有限公司北京中国 100081)

摘要:

近年来，借助DNS协议良好的隐蔽性和穿透性实施数据窃取已成为诸多APT组织青睐的TTPs，在网络边界监测DNS流量进而精准发现潜在攻击行为已成为企事业单位急需建立的网络防御能力。然而，基于DNS的APT攻击所涉及的恶意样本存在难获取、数量少、活性很低等现实问题，且主流的数据增强技术不适合移植到网络攻防这个语义敏感领域，这些问题制约了AI检测模型训练。为此，本文基于DNS窃密攻击机理分析，并结合了大量真实APT案例和DNS工具，提出了一种基于攻击TTPs的DNS窃密流量数据自动生成及应用方法，设计并实现了DNS窃密流量数据自动生成系统—MalDNS，以生成大规模、高逼真度、完备度可调的DNS窃密数据集。最后，通过实验验证了生成流量数据的有效性，以及对检测模型训练的有效支撑。

关键词: DNS窃密数据自动生成

DOI：10.19363/J.cnki.cn10-1380/tn.2021.01.01

投稿时间：2020-09-29修订日期：2020-11-24

基金项目:广东省重点领域研发计划项目（No.02019B010136003，No.2019B010137004）和国家重点研发计划项目（No.2018YFB0803504，No.2019YFA0706404）资助。

Automatic Data Generation of DNS-Based Exfiltration for AI-Model Training

FENG Lin¹, CUI Xiang¹, WANG Zhongru², GAN Ruiling³, DIAO Jiawen³, HAN Dongxu⁴, JIANG Hai⁵

(1.Cyberspace Institute Advanced Technology, Guangzhou University, Guangzhou 510006, China;2.Chinese Academy of Cyberspace Studies, Beijing 100010, China;3.School of Cyberspace Secunty, Beijing University of Posts and Telecommunications, Beijing 100876, China;4.Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;5.Beijing DigApis Technology Co., Ltd, Beijing 100081, China)

Abstract:

In recent years, it has become the favorite TTPs of many APT organizations to implement data exfiltration by taking advantage of the good concealability and penetration of DNS protocol. Therefore, it’s imperative for enterprises and institutions to establish the defense capacity to monitor DNS traffic at the network boundary so as to accurately detect the potential attack behavior. However, datasets of DNS-based APT campaigns involve lots of practical problems such as difficulty to obtain, small quantity, and low activity. Also, the available technology of data augmentation is not suitable for transplanting to such semantic sensitive field. These problems have restricted the training of AI detection models. Therefore, based on the analysis of DNS-based exfiltration mechanism, combined with a large number of real APT cases and DNS-based exfiltration tools, we propose a method that can automatically generate traffic data based on DNS -based exfiltration TTPs. We design and establish an automatic generation system named MalDNS to generate a target DNS-based exfiltration dataset with large-scale, high fidelity, and adjustable integrity. Finally, our experiments indicate that the generated dataset is effective and can support the training of the detection models effectively.

Key words: DNS-based exfiltration data generation