基于图表示学习的消息回复关系判断方法

梁永明; 田恬; 杨小雨; 张熙; 邱莉榕

引用本文：

梁永明,田恬,杨小雨,张熙,邱莉榕.基于图表示学习的消息回复关系判断方法[J].信息安全学报,2021,6(5):199-214 [点击复制]
LIANG Yongming,TIAN Tian,YANG Xiaoyu,ZHANG Xi,QIU Lirong.The Method for Identifying the Conversation Responding Relationships using Graph Representation Learning[J].Journal of Cyber Security,2021,6(5):199-214 [点击复制]

本文已被：浏览 10129次下载 9304次	码上扫一扫！
基于图表示学习的消息回复关系判断方法
梁永明^1,2, 田恬^1,2, 杨小雨^1,2, 张熙^1,2, 邱莉榕^2,3
0 字体:加大+\|默认\|缩小-
(1.北京邮电大学网络空间安全学院北京中国 100876;2.北京邮电大学可信分布式计算与服务教育部重点实验室北京中国 100876;3.北京邮电大学计算机学院(国家示范性软件学院) 北京中国 100876)

摘要:

微信、QQ和钉钉等社交媒体都提供多对多聊天群组功能，这些聊天群组包含海量信息，对群组聊天内容进行有效分析，获取有价值的关联信息，是当前领域的研究热点。群组中用户间交互是群组实现的主要功能，用户间消息回复是用户间交互实现的方式，消息间的回复行为下隐藏着消息间和用户间的关系。群组消息间回复通常是隐式和非连续的，大部分群组消息间没有指定明确的回复关系，当前消息也不一定是上一条临近消息的回复，回复关系要根据具体的聊天场景确定。当消息间没有显示指定回复关系时，回复不易于分析和理解群组聊天内容，阻碍了对群组聊天内容的整体性分析。本论文针对群组消息间的回复关系，提出了基于图表示学习的消息回复关系判断方法，该方法不同于以往方法仅使用部分群组要素，是在综合学习消息的文本信息、发送消息的用户信息和上下文信息的基础上，根据群组内容构建群组图和生成自适应消息图，得到了多种群组要素信息和要素间关系组成的图结构，利用图模型在图结构上进行群组消息的表示学习，图模型输出群组消息的表示向量，拼接消息对的表示向量并进一步预测群组消息间的回复关系。在消息间回复关系的学习过程中，图模型通过任务学习更新图中消息节点，同时更新图中用户节点向量表示，经过用户向量分析实验验证了该模型输出的用户向量的有效性和合理性。在公开数据集和标注数据集上进行了对比实验和显著性检验分析，结果显示模型在多个评估指标上大幅优于对比模型，如在F1指标上，比单纯依赖BERT的句子对分类模型提高了接近20%。

关键词: 图模型对话系统消息回复自然语言推理会话分析自适应构图群组分析

DOI：10.19363/J.cnki.cn10-1380/tn.2021.09.15

投稿时间：2021-04-30修订日期：2021-08-09

基金项目:本课题得到国家自然科学基金项目资助（61976026）资助。

The Method for Identifying the Conversation Responding Relationships using Graph Representation Learning

LIANG Yongming^1,2, TIAN Tian^1,2, YANG Xiaoyu^1,2, ZHANG Xi^1,2, QIU Lirong^2,3

(1.School of Cyberspace Secunty, Beijing University of Posts and Telecommunications, Beijing 100876, China;2.Key Laboratory of Trustworthy Distributed Computing and Service(BUPT), Ministry of Education, Beijing 100876, China;3.School of Computer(National demonstrative school of software), Beijing University of Posts and Telecommunications, Beijing 100876, China)

Abstract:

Social media, such as WeChat, QQ and Ding Talk, all provide many-to-many chat groups. These chat groups contain a large amount of information. It is a research hotspot in the current field to effectively analyze the group chat content and obtain valuable related information. Interaction between users in a group is the main function of group implementation, and message reply between users is the way to realize interaction between users. The relationship between messages and users is hidden under the reply behavior between messages. The reply between group messages is usually implicit and discontinuous. Most group messages do not specify a clear reply relationship, and the current message is not necessarily the reply of the previous adjacent message. The reply relationship should be determined according to the specific chat scene. When there is no designated reply relationship between messages, the reply is not easy to analyze and understand the group chat content, which hinders the overall analysis of the group chat content. In this paper, aiming at the reply relationship between group messages, a method of judging message reply relationship based on graph representation learning is proposed. This method is different from the previous method, which only uses part of group elements. Based on the comprehensive study of text information, user information and context information of messages, it constructs a group graph and generates an adaptive message graph according to group content, and obtains a graph structure composed of various group element information and relationships among elements. The graph model is used to learn the representation of group messages on graph structure. The graph model outputs the representation vectors of group messages, splices the representation vectors of message pairs and further predicts the reply relationship between group messages. In the learning process of reply relationship between messages, the graph model updates the message nodes in the graph through task learning, and updates the vector representation of user nodes in the graph at the same time. The validity and rationality of the user vectors output by the model are verified by user vector analysis experiments. A comparative experiment and significance test analysis are carried out on the public data set and the labeled data set. The results show that the model is significantly superior to the comparative model in many evaluation indexes, such as the F1 index, which is nearly 20% higher than the sentence classification model which only depends on BERT.

Key words: graph model dialogue system conversation responding relationships natural language inference conversation analysis adaptive graph construction group analysis