大规模图推荐模型的快速优化机制

杨正一; 吴宾; 王翔; 何向南

本文已被：浏览 11360次下载 8978次	码上扫一扫！
大规模图推荐模型的快速优化机制
杨正一,吴宾,王翔,何向南
分享到：微信更多字体:加大+\|默认\|缩小-
(中国科学技术大学信息科学技术学院合肥中国 230026;郑州大学信息工程学院郑州中国 450001;新加坡国立大学 NExT++中心新加坡新加坡 119077;中国科学技术大学信息科学技术学院合肥中国 230026;中国科学技术大学大数据学院合肥中国 230026)

摘要:

推荐系统在帮助用户从海量数据中发现自己感兴趣的信息时能起到重要作用。近些年来，深度学习在计算机视觉等诸多领域卓有成效，吸引了越来越多推荐系统领域学者的关注。推荐系统结合图神经网络等深度学习方法取得了令人瞩目的效果。然而，现存的许多方法主要关注在如何用深度学习模型来设计推荐系统的架构，却少有工作关注推荐系统的优化框架，尤其是从优化框架方面提升推荐系统的训练效率。因此随着模型的日益复杂，训练模型的时间代价也越来越大。
本工作中，我们试图从优化框架方面提升大规模图推荐模型的训练效率。推荐系统中最主流的模型优化框架为贝叶斯个性化排序（Bayesian personalized ranking，BPR），其潜在假设是目标用户对于已交互的物品的喜好程度强于未交互的物品，然后通过最大化用户对感兴趣物品和不感兴趣物品的评分差来实现。然而，BPR优化器的瓶颈在于模型参数的学习效率低下，在计算资源有限，且用户的兴趣要具有时效性等现实因素下，极大限制了主流图推荐模型在工业场景中的应用。究其原因，BPR优化器需要每个训练样本对单独经过非线性激活函数，这样元素级别的运算无法转化为矩阵操作等并行计算的形式，进而未能发挥GPU的并行加速性能。受平方误差损失函数在结合推荐任务时，对矩阵化操作较为友好的启发，我们设计了一种快速非采样优化器FGL，可广泛适用于主流图推荐模型。经过一系列理论推导与转换，FGL有效规避了损失函数中复杂度较高的计算项，极大提升了模型的训练效率。以经典矩阵分解模型和最先进的图神经网络模型LightGCN为代表，本文在四个基准数据集上进行了大量的实验。实验结果表明，FGL优化器在保证推荐准确度下，其训练效率相比于BPR获得了数量级层面的加速，表明FGL在现实工业场景中具有很大的应用潜力。

关键词: 推荐系统图神经网络优化框架训练效率

DOI：10.19363/J.cnki.cn10-1380/tn.2021.09.08

投稿时间：2021-04-30修订日期：2021-08-08

基金项目:本课题得到国家自然科学基金（No.U19A2079，No.61972372）资助。

A Faster Optimization Mechanism for Large-scale Graph Recommendation Models

YANG Zhengyi,WU Bin,WANG Xiang,HE Xiangnan

School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China;School of Information Engineering, Zhengzhou University, Zhengzhou 450001, China;NExT++ Center, National University of Singapore, Singapore 119077, Singapore;School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China;School of Data Science, University of Science and Technology of China, Hefei 230026, China

Abstract:

Recommendation system plays an important role in helping users find their personalized interested information from the overwhelming data. In the recent years, as deep learning has shown its extraordinary performance in a wide variety of fields such as computer vision, it attracts more and more attention from the researchers in the field of recommendation systems. Many deep learning models such as graph neural networks have been applied into recommendation systems and achieved state-of-the-art performance. However, existing methods have mainly focused on exploring how to design the structures of recommendation systems with deep models, while few works have considered the optimization framework and how to improve the efficiency of recommendation system in respect to it. Therefore, as the models become more complex, it takes more and more time to train the models.
In this work, we manage to speed up the training process of large-scale graph recommendation models in respect to the optimization phase. The most often used optimization framework for recommendation system is Bayesian personalized ranking (BPR), which supposes that a targeted user has more preference on items he has interacted with than others, and achieves it by maximizing the margin between the predicted interested scores and the uninterested ones. However, the bottleneck of BPR exists in the inefficiency of model parameter learning, which massively limits industrial applications of mainstream graph recommendation models as the computation resources are limited and users' interest should be updated as fast as possible. The fundamental reason lies in the formulation of BPR:every pair of training tuple is transformed by a non-liner activation function separately with only element-wise computation and no matrix or vector operation, which fails to fully leverage GPUs' speed-up ability as GPU is designed for fast parallel computing. Inspired by the fact that square loss is more friendly for matrix operation when it is combined with recommendation tasks, we propose a fast and generic learner:FGL, which has no sampling process and can be widely applied into modern graph recommendation models. After a series of theoretical derivation, high-complexity terms in FGL can be effectively avoided, thus extensively accelerating the model learning process. Extensive experiments were conducted on the classic model matrix factorization (MF) and state-of-the-art graph neural network model LightGCN with four baseline datasets. Results shows that our optimization method is faster than BPR by several magnitudes, while acquires comparable even better accuracy performance, which indicates its potential of application in real-world in industrial tasks.

Key words: recommendation system graph neural networks optimization framework training efficiency