摘要: |
目前很多黑产团伙为了对抗监管部门对其网站域名的封锁,会批量搭建黑产网站和注册大量的黑产域名。这些批量注册的域名之间存在着一定的相似性,这种相似性使得研究者可以对已知域名进行分析,进而研究未知域名的生成。本文研究区别于以往研究对黑产域名生成的处理方法,我们将域名生成任务转变为翻译任务。首先,我们采用Bi-LSTM将训练数据中的域名转为表征向量后进行层次聚类,将相似的域名聚为一类,并且通过参数设置,使得聚类簇中的域名个数尽量分布均匀。然后,根据聚类结果生成翻译模型所需的域名对数据。最后,使用Transformer模型自动学习相似域名之间潜在的变化规则进行黑产域名的变换生成。其中域名生成结果检验采用的是我们自己提出的两阶段黑产网站检测模型,模型通过设置置信度阈值的方式控制检测模型大小以及所需数据来平衡识别准确率和效率。实验表明,生成算法生成的域名中,可访问域名中黑产域名比率为19.1%,黑产域名的扩展倍数达到了359.98,即通过一个黑产域名可以平均扩展出近360个新的黑产域名。实验结果证明了该方法在黑产域名变换生成上的有效性,并解决了现有公害域名生成方法难以控制域名生成的范围,存在大量的无效域名的问题。 |
关键词: 域名生成算法 翻译模型 聚类 多模态 |
DOI:10.19363/J.cnki.cn10-1380/tn.2025.03.11 |
投稿时间:2023-09-01修订日期:2023-10-15 |
基金项目:本课题得到国家重点研发计划项目(No. 2021YFB3100500)资助。 |
|
DNTrans: Illicit Domain Name Transformation Generation Method Based on Transformer |
WANG Bo, SHI Fan
|
(College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China) |
Abstract: |
Currently, many illicit groups, in order to evade regulatory authorities’ domain name blocking efforts, engage in the mass creation of illegal websites and register a substantial number of illicit domain names. These bulk-registered domain names exhibit a certain level of similarity, a resemblance that allows researchers to analyze known domain names and subsequently explore the generation of unfamiliar domain names. This paper’s research approach sets it apart from previous studies on the generation of illicit domain names. We convert the domain name generation task into a translation task. Firstly, we employ a Bi-LSTM to convert domain names in the training data into representation vectors, and then perform hierarchical clustering to group similar domain names together. Additionally, through parameter settings, we aim to evenly distribute the number of domain names within each cluster. Subsequently, based on the clustering outcomes, we generate the domain name pairs essential for training the translation model. Finally, we utilize a Transformer model to automatically grasp the latent alteration patterns between similar domain names, thus generating transformed versions of illicit domain names. The assessment of domain name generation results incorporates our self-devised two-stage detection model for illicit websites. The illegal website detection model controls the size of the detection model and the required data by setting confidence thresholds to balance recognition accuracy and efficiency. Experimental results demonstrate that among the domain names generated by the algorithm, the proportion of accessible illicit domain names is 19.1%, and the expansion factor of illicit domain names reaches 359.98. This implies that, on average, nearly 360 new illicit domain names can be spawned by altering a single illicit domain name. The experimental results demonstrate the effectiveness of this method in generating transformations for illicit domain names, addressing the challenges associated with controlling the scope of domain name generation and the presence of numerous invalid domain names in existing illicit domain name generation methods. |
Key words: domain name generation algorithm translation model clustering multimodal |