个性化实时电影推荐系统实用原型和评估外文翻译资料
2023-03-02 15:05:52
个性化实时电影推荐系统;
实用原型和评估
Jiang Zhang,Yufeng Wang*,Zhiyuan Yuan,Qun Jin
摘要:随着大数据的爆发,在电子商务,社交网络和一些基于Web的服务中,实用的方案推荐在各个领域都是非常重要的。 如今,存在许多利用公开可用的电影数据集(例如,MovieLens和Netflix)并返回改进的性能度量(例如,5均方根误差(RMSE))的个性化电影推荐方案。 然而,电影推荐系统面临两个有趣的问题是
最后提出:一是可扩展性,二是基于实际实现的实际使用反馈和验证。 特别地,协同过滤(CF)是实现推荐系统的主要主流技术之一。 然而,传统的CF方案存在时间复杂度问题,这使得它们不适合于现实世界的推荐系统。 在本文中,我们针对这两个问题进行了探讨。 首先,提出了一种简单高效的推荐算法
roposed,它利用用户的配置文件属性将用户划分为几个集群。 针对每一个聚类,设计了一个虚拟意见领袖来代表整个聚类,从而大大降低了原始用户项矩阵的维数,然后设计了一种加权斜率One-VU方法,并将其应用于虚拟意见领袖项矩阵中,得到了改进结果。 与传统的基于聚类的CF推荐方案相比,本文提出的方法可以显著提高CF推荐的效率 y降低了时间复杂度,同时获得了相当的推荐性能。 进而,我们构建了一个真实的基于Web的个性化电影推荐系统--MovieWatch,并将其开放给公众,收集用户对推荐的反馈,并基于这个真实世界的数据来评估我们系统的可行性和准确性。
1
导言
Creative Commons Attribution 4,0 International License (http://creativecommons,org/licenses/by/4,0/)・
|
resourcestoprocessinformationandform
recommendations. The majority of these resources are
consumed in determining users with similar tastes and
items with similar descriptions. Therefore, CF algorithms
face a scalability problem, which can become an important
factor for a recommendation system. If the problem is not
solved, it is difficult to produce real-time recommendations.
Most existing movie recommendation schemes have worked on improving perfonnance metrics, such as Root-Mean-Square Error (RMSE), utilizing publicly available movie datasets (e.g., MovieLens). The popular methodology is to adopt the so-called 8:2 cross validation, i.e., 80% of the MovieLens data are used as a training set, while the other 20% of data are used for test purposes. However, it is reported that roughly 80% of the publications in this field describe problems or future work that focus on the implementation or verification of systemsreg;. This result highlights the importance of these areas in recommendation system development, and shows a need to collect real user feedbacks on movie recommendation through a practically deployed movie recommendation system, and use this real-life data to compare with public dataset-driven research.
1.2 Main contribution
This papers primary contributions are twofold.
- First, a scalable CF algorithm called Weighted KM-Slope-VU is proposed, which significantly reduces time
complexity and is suitable for a real-time recommendation system. Specifically, by exploiting users profile attributes, our scheme first adopts the K-means clustering algorithm to partition users into several clusters. Each cluster then produces a virtual opinion leader (i.e., a Virtual User (VU)) to represent all other users in the cluster for the evaluation of the items. The Weighted KM-Slope-VU is designed and applied in place of the original user rating data, to reduce the dimensions of the user-item matrix and make a prediction on the basis of the VU- item matrix, which can significantly reduce the time complexity. The method is innovative and efficient; experiments on the MovieLens dataset illustrate that our scheme is comparable to existing work (including two popular Matrix Factorization (MF)-based RS methods, Singular-Value Decomposition (SVD) and SVD ).
- Second, we have constructed a working personalized web-based movie recommendation system called
MovieWatch (http://l 21.42.174.147:8080/ Movie/login.action), opened to the public. We collected preliminary feedback from registered users of this live system and used this real-life data as a basis to evaluate the feasibility and accuracy of our system.(http://l
The rest of this paper is organized as follows: Section 2 provides a literature survey of previous related works; in Section 3, we present the Slope One algorithm and the newly proposed algorithm 一 Weighted KM- Slope-VU 一
in detail, and illustrate its performance through experiments; Section 4 specifies the deployed MovieWatch system, and discusses the experimental results; finally, Section 5 briefly concludes the paper and discusses avenues for future research.
2 Related Work
As shown in Fig. 1, existing recommendation algorithms can be divided into four kinds: contentbased, knowledge-based, CF, and hybrid. Among these recommendation algorithms, CF is the most popular technique, based on the core assumption that users who have expressed similar interests in the past will share common interests in the future⑸.CF methods can be model-based or memory-based.
182Tsinghua Science and Technology, April 2020, 25(2): 180-191
Fig. 1 Categories of existing recommendation algorithms.
Model-based algorithms first construct a model to represent user behavior and, therefore, to predict their ratings. The parameters of the model are estimated using the data from the rating matrix. There are many model-based approaches: Principal Component Analysis (PCA) and SVD are based on algebra[6,7], Bayes methods are based on statistics[8]. Matrix factorization for recommender systems has been a special focus of a voluminous amount of research, especially since the Netflix Prize competition was announced. This methodology transforms both items and users to the same latent factor space, thus making them directly comparable. The latent space tries to explain ratings by characterizing both products and users on factors automatically inferred from user feedback. For e
剩余内容已隐藏,支付完成后下载完整资料
资料编号:[410010],资料为PDF文档或Word文档,PDF文档可免费转换为Word