摘要: |
网络谣言可能扰乱人们的思想、心理和行为,引发社会震荡、危害公共安全,而微博等社交平台的广泛应用使得谣言造成的影响与危害变得更大,因此,谣言检测对于网络空间的有序健康发展具有重要的意义。当前谣言的自动检测技术更多关注检测模型的构建和输入数据的表现形式,而在改善数据质量以提高谣言识别效果方面的研究很少。基于此,本文将粗糙集理论应用于不完备谣言信息系统进行知识获取与决策,实质上是通过粗糙集理论解决不完备谣言信息系统的不确定性度量,冗余性以及不完备性等问题,以获得高质量的数据,改善谣言检测效果。首先系统总结了粗糙集理论中不确定性度量的方法,包括香农熵、粗糙熵、Liang熵以及信息粒度等四种不确定度量方法,并整理和推导了这四种不确定度量方法从完备信息系统到不完备信息系统的一致性拓展。基于上述总结的四种不确定度量方法,提出了基于最大相关最小冗余(MCMR,Maximum CorrelationMinimum Redundancy)的知识约简算法。该方法基于熵度量方式,能够综合考量决策信息与冗余噪音,在UCI及Weibo等8个数据集上实验验证,结果表明本文算法优于几种基线算法,能够有效解决信息系统的冗余性。另外,提出了一种基于极大相容块的不完备决策树算法,在不同缺失程度数据上实验验证,结果表明本文算法能够有效解决信息系统的不完备性。 |
关键词: 谣言检测 粗糙集 不完备信息系统 最大相关最小冗余 极大相容块 |
DOI:10.19363/J.cnki.cn10-1380/tn.2024.03.02 |
Received:May 24, 2022Revised:August 19, 2022 |
基金项目:本课题得到中原英才计划项目(No.212101510002)资助。 |
|
Knowledge Acquisition and Decision Making in Incomplete Rumor Information System based on Rough Set |
WANG Biao,WEI Hongquan,WANG Kai,LIU Shuxin,JIANG Haocong |
PLA Strategic Support Force Information Engineering University, Zhengzhou 450002, China;PLA Strategic Support Force Information Engineering University, Zhengzhou 450002, China;National Digital Switching System Engineering and Technological R&D Center, Zhengzhou 450002, China |
Abstract: |
Online rumors may disrupt people’s thoughts, psychology and behavior, cause social shocks and endanger public safety. The widespread use of social platforms such as Weibo makes the impact and harm caused by rumors even greater. Therefore, rumor detection is of great significance to the orderly and healthy development of cyberspace. The current automatic detection techniques for rumors focus more on the construction of detection models and the representation of input data, while there is little research on improving the quality of data to improve the effect of rumor detection. Based on this idea, this paper applies the rough set theory to the incomplete rumor information system for knowledge acquisition and decision-making. In essence, to obtain high-quality data and improve rumor detection, the rough set theory is used to solve the uncertainty measurement, redundancy, and incompleteness of the incomplete rumor information system. Firstly, it systematically summarizes the methods of uncertainty measurement in rough set theory, including four uncertainty measurement methods such as Shannon entropy, rough entropy, Liang entropy, and information granularity, and organizes and derives the consistent expansion of the four uncertainty measurement methods from complete information system to incomplete information system. Based on the four uncertainty measurement methods summarized above, a knowledge reduction algorithm based on Maximum Correlation Minimum Redundancy (MCMR) is proposed. The method is based on entropy measurement, which can comprehensively consider decision information and redundant noise. Experiments on 8 data sets such as UCI and Weibo show that the algorithm in this paper is superior to several baseline algorithms and can effectively solve the redundancy of the information system. In addition, this paper proposes an incomplete decision tree algorithm based on maximal consistent blocks. Experiments on data with different degrees of missingness show that the algorithm in this paper can effectively solve the incompleteness of the information system. |
Key words: rumor detection rough set incomplete information system maximum correlation minimum redundancy maximal consistent blocks |