基于黑暗三角心理学理论的大语言模型仇恨响应预测方法

王晔; 曾浩源; 曾祥; 贾焰; 周斌

引用本文：

王晔,曾浩源,曾祥,贾焰,周斌.基于黑暗三角心理学理论的大语言模型仇恨响应预测方法[J].信息安全学报,已采用 [点击复制]
Wang Ye,Zeng Haoyuan,Zeng Xiang,Jia Yan,Zhoubin.Dark Triad Psychology Theory Based Hate Response Pre-diction Method via Large Language Model[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 331次下载 0次
基于黑暗三角心理学理论的大语言模型仇恨响应预测方法
王晔¹, 曾浩源², 曾祥¹, 贾焰¹, 周斌¹
0 字体:加大+\|默认\|缩小-
(1.国防科技大学;2.科大讯飞股份有限公司)

摘要:

仇恨言论通过在线社交媒体传播，易激化群体冲突、放大彼此误解、引发暴力事件，并造成扰乱公共理性等严峻问题。自动响应预测任务在帮助内容创作者和平台评估博文发布后可能产生的影响以及遏制仇恨言论传播方面具有重要的现实意义。以往的研究通过分析用户的历史行为和用户资料进行个性化建模，在响应预测任务上取得了显著进展。然而，这些研究存在两个主要局限性：首先，现有研究直接将用户资料和用户历史发言作为模型输入易引入不相关内容从而产生噪声；其次，仅依赖用户历史和用户资料进行建模的方法只能反映用户的过去行为特征，而无法深入挖掘用户的核心心理属性。此外，现有研究主要预测的是用户在面对特定博文时发表言论的情感强度和情感极性，将其应用到仇恨治理领域的研究较少并且没有相关数据集。针对上述问题，我们首先构建了一个中文仇恨响应预测数据集，填补了响应预测在仇恨治理领域以及中文语境下的数据空白，为后续研究提供了重要基础。其次，我们提出了一种名为 DARKSENSE 的新方法，这是一种基于黑暗三角心理学理论的大语言模型提示技术。DARKSENSE通过结合黑暗三角等心理理论，提取用户的核心心理属性，并生成更具代表性的用户描述，从而显著提升响应预测的准确性。最后，我们将响应预测任务进一步拓展至仇恨治理领域，不仅能够预测用户即将发布的推文的仇恨极性，还能精准识别其潜在的仇恨目标，实现了更细粒度的分析与干预。

关键词: 社交网络仇恨响应预测仇恨言论仇恨极性仇恨目标仇恨强度

DOI：

投稿时间：2025-09-23修订日期：2026-03-17

基金项目:国家自然科学基金项目（面上项目，重点项目，重大项目）

Dark Triad Psychology Theory Based Hate Response Pre-diction Method via Large Language Model

Wang Ye¹, Zeng Haoyuan², Zeng Xiang¹, Jia Yan¹, Zhoubin¹

(1.National University of Defense Technology;2.Iflytek Co.,Ltd)

Abstract:

The problem of hate speech on social media is becoming increasingly severe, and automatic response prediction tasks have important practical significance in helping content creators assess the potential impact of posting blogs and curbing the spread of hate speech. Previous research has made significant progress in response prediction tasks by analyzing us-ers' historical behavior and user decriptions for personalized modeling. However, these studies have two major limita-tions: first, existing research directly uses user descriptions and user historical speeches as model inputs, which easily introduces irrelevant content and generates noise; second, methods that only rely on user history and user descriptions for modeling can only reflect users' past behavioral characteristics, but cannot deeply excavate users' core psychological attributes. In addition, existing research mainly predicts users' emotional intensity and emotional polarity when facing specific blogs, and its application to hate governance research is rare and lacks relevant datasets. To address the above problems, we first constructed a Chinese hate response prediction dataset, filling the data gap in response prediction in the field of hate governance and the Chinese context, providing an important foundation for subsequent research. Second, we proposed a new method called DARKSENSE, which is a large language model prompting technique based on the dark triad psychological theory. DARKSENSE extracts users' core psychological attributes by combining the dark triad and other psychological theories, generates more representative user descriptions, and significantly improves the accuracy of response prediction. Finally, we further extended the response prediction task to the field of hate governance, not only predicting the hate polarity of users' upcoming tweets but also accurately identifying their potential hate targets, achiev-ing more fine-grained analysis and intervention.

Key words: social networks hate response prediction hate speech hate polarity hate target hate intensity