引用本文
  • 张慧妍,霍艺璇,赵佳鹏,王学宾,孙岩炜,赵璨,时金桥.去除主题偏差的暗网论坛作者归属分析方法[J].信息安全学报,已采用    [点击复制]
  • zhanghuiyan,huoyixuan,zhaojiapeng,wangxuebin,sunyanwei,zhaocan,shijinqiao.A Topic Debiasing Method for Author Attribution Analysis of Dark Web Forums[J].Journal of Cyber Security,Accept   [点击复制]
【打印本页】 【下载PDF全文】 查看/发表评论下载PDF阅读器关闭

过刊浏览    高级检索

本文已被:浏览 23次   下载 0  
去除主题偏差的暗网论坛作者归属分析方法
张慧妍1, 霍艺璇1, 赵佳鹏1, 王学宾2, 孙岩炜1, 赵璨2, 时金桥1
0
(1.北京邮电大学;2.中国科学院信息工程研究所)
摘要:
暗网市场由于其匿名性经常被用来交易非法商品和服务。为了更好地对频繁迁移的匿名论坛用户进行作者分析和取证,可以利用用户留下的大量帖子中的文本信息来识别和分析用户,实现匿名用户的身份追踪。然而,现有的暗网论坛作者归属分析方法学到的作者身份表示会受数据集中主题分布的影响,在训练集与测试集主题分布不一致的情况下出现明显性能下降,这意味着现有的模型学习方法在训练过程中捕获到了主题分布与标签之间的虚假相关特征,使得模型在数据分布变化的场景下无法达到预期的预测结果。本文通过结构因果模型(Structural Causal Models, SCM)解释了文本主题给作者归属分析任务引入偏差的原因,并将因果推理的方法引入基于文本的作者归属分析工作,提出了一种因果去偏方法来缓解主题分布不一致对作者身份表示的影响。该方法通过保持同一作者身份表示在各主题下的潜在因果效应的不变性来减少模型对主题分布信息的依赖,学习在各个主题下性能表现更稳定的作者表示。在四个不同的暗网论坛数据集上的实验结果表明,本文的方法实现了在MRR和Recall@10上的性能提升,平均提升7.25%和6.33%。
关键词:  作者归属  暗网论坛  用户分析
DOI:
投稿时间:2024-11-16修订日期:2025-02-28
基金项目:国家自然科学基金项目(面上项目,重点项目,重大项目)
A Topic Debiasing Method for Author Attribution Analysis of Dark Web Forums
zhanghuiyan1, huoyixuan1, zhaojiapeng1, wangxuebin2, sunyanwei1, zhaocan2, shijinqiao1
(1.Beijing University of Posts and Telecommunications;2.Institute of Information Engineering, Chinese Academy of Sciences)
Abstract:
Darknet marketplaces are often used to trade illegal goods and services due to their anonymity. In order to better ana-lyze anonymous forum users who migrate frequently, the text information in a large number of posts left by users can be used to identify and analyze users, and the identity of anonymous users can be tracked. However, the authorship representation learned by the existing dark web forum author attribution analysis method will be affected by the topic distribution in the dataset, and the performance will be significantly degraded when the topic distribution of the training set is inconsistent with the test set, which means that the existing model learning methods capture the spurious correlation features between the topic distribution and the labels in the training process, which makes the model unable to achieve the expected prediction results in the scenario of data distribution change. This paper uses the Structural Causal Models (SCM) to explain the reason why the text subject introduces bias to the author attribution analysis task, and introduces the method of causal reasoning into the text-based author attribution analysis, and proposes a causal debiasing method to alleviate the influence of topic distribution inconsistency on the representation of authorship. This method reduces the model's dependence on topic distribution information by maintaining the invariance of the poten-tial causal effects under each topic represented by the same authorship, and learns the author representation with more stable performance under each topic. Experimental results on four different dark web forum datasets show that the proposed method achieves an average performance improvement of 7.25% and 6.33% on MRR and Recall@10.
Key words:  authorship attribution, darkweb forum, user analysis