基于文本注意力的推荐系统可解释性研究

朱芮; 刘布楼; 刘艺语; 邹鑫雨; 李晨亮

本文已被：浏览 11961次下载 8552次	码上扫一扫！
基于文本注意力的推荐系统可解释性研究
朱芮,刘布楼,刘艺语,邹鑫雨,李晨亮
分享到：微信更多字体:加大+\|默认\|缩小-
(武汉大学国家网络安全学院武汉中国 430072;清华大学计算机科学与技术系北京中国 100084)

摘要:

可解释性能够提高用户对推荐系统的信任度并且提升推荐系统的说服力和透明性，因此有许多工作都致力于实现推荐系统的可解释性。由于评论中包含了丰富的信息，能够体现用户偏好与情感信息，同时包含了对应商品所具有的特性，最近的一些基于评论的深度推荐系统有效地提高了推荐系统的可解释性。这些基于评论的深度推荐系统中内置的注意力机制能够从对应的评论中识别出有用的语义单元（例如词、属性或者评论），而推荐系统通过这些高权重的语义单元做出决策，从而增强推荐系统的可解释性。但可解释性在很多工作中仅作为一个辅助性的子任务，只在一些案例研究中来做出一些定性的比较，来说明推荐系统是具有可解释性的，到目前为止并没有一个能够综合地评估基于评论推荐系统可解释性的方法。本文首先根据在注意力权重计算机制的不同，将这些具有可解释性的基于评论的推荐系统分为三类：基于注意力的推荐系统，基于交互的推荐系统，基于属性的推荐系统，随后选取了五个最先进的基于评论的深度推荐系统，通过推荐系统内置的注意力机制获得的评论权重文档，在三个真实数据集上进行了人工标注，分别量化地评价推荐系统的可解释性。标注的结果表明不同的基于评论的深度推荐系统的可解释性是具有优劣之分的，但当前的基于评论的深度推荐系统都有超过一半的可能性能够捕捉到用户对目标评论的偏好信息。在评估的五个推荐系统中，并没有哪个推荐系统在所有的数据中具有绝对的优势。也就是说，这些推荐系统在推荐可解释性方面是相互补充的。通过进一步的数据分析发现，如果推荐系统具有更精确的分数预测结果，那推荐系统通过注意力机制获得的高权重的信息确实更能够体现用户的偏好或者商品特征，说明推荐系统内置的注意力机制在提高可解释性的同时也能够提高预测精度；并且发现相较于长评论，推荐系统更容易捕捉到较短的评论中的特征信息；而可解释性评分高的推荐系统会更可能地为形容词赋予较高的权重。本文也为推荐系统可解释性评估进一步研究和探索更好的基于评论的推荐系统解决方案提供了一些启示。

关键词: 推荐系统注意力机制可解释性用户评论深度学习

DOI：10.19363/J.cnki.cn10-1380/tn.2021.09.10

投稿时间：2021-04-30修订日期：2021-08-05

基金项目:本课题得到国家自然科学基金（No.61872278）资助。

Research on Interpretability of Recommendation System based on Text Attention Mechanism

ZHU Rui,LIU Bulou,LIU Yiyu,ZOU Xinyu,LI Chenliang

School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China;Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Abstract:

Interpretability can enhance users' trust in the recommendation systems, and improve the persuasion and transparency of the latter. So far, many efforts have been devoted to achieve recommendation interpretability. The rich information provided in user reviews can reflect user's preference and consumption experience, as well as the corresponding item's features. Hence, recent deep review-based recommendation systems capitalize reviews for accurate and interpretable recommendation and have advanced this purpose significantly. The built-in attention module devised in these deep review-based recommendation systems models can identify semantic units (e.g., words, aspects, or individual reviews) from the corresponding reviews, which also facilitate the interpretability of the recommendation systems. However, interpretability is typically taken as an auxiliary evaluation subtask in some works, where examples are used as case studies for some qualitative comparison to show that the recommendation system is interpretable. Right now, there is no comprehensive evaluation towards how good the interpretability delivered by these review-based recommendation systems are. In this paper, according to the different calculation methods of attention weight, we first summarize existing deep review-based recommendation systems into three categories:attention-based recommendation system, interaction-based recommendation system, and aspect-based recommendation system. Then, we perform a human evaluation based on the built-in attention mechanism of five state-of-the-art deep review-based recommendation systems across three real-world datasets, covering all three categories for interpretability evaluation. The annotation results suggest that the interpretability of different deep review-based recommendation systems is different, but the current deep review-based recommendation systems can successfully uncover more than half of user's preference for the target item with higher chance. We also note that there is no absolute winner in discovering user preference from all cases, among the five recommendation systems evaluated. That is, the models are complementary to each other in terms of recommendation interpretability. Through further data analysis, it is found that a higher recommendation accuracy often indicates that the highlighted information in the reviews is indeed relevant to the user's preferences or item's features. It shows that the built-in attention mechanism of the recommendation systems can not only enhance the interpretability, but also improve the prediction accuracy. Moreover, we found that compared with long reviews, recommendation systems are easier to capture the feature information in shorter reviews; and recommendation systems with high interpretability scores are more likely to give adjectives a higher weight. Overall, this work sheds some light on further research towards the development of interpretability evaluation and better review-based recommendation system solutions.

Key words: recommendation systems attention mechanism interpretability user reviews deep learning