《计算机应用研究》|Application Research of Computers

基于Spark的混合协同过滤算法改进与实现

New improvement and implementation of hybrid collaborative filtering algorithm based on Spark platform

免费全文下载 (已被下载 次)  
获取PDF全文
作者 王源龙,孙卫真,向勇
机构 1.首都师范大学 信息工程学院 计算机科学与技术系,北京 100048;2.清华大学 计算机科学与技术系,北京 100084
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2019)03-042-0855-06
DOI 10.19734/j.issn.1001-3695.2017.10.0933
摘要 针对传统协同过滤在推荐过程中存在的稀疏性、扩展性以及个性化问题,通过引入算法集成的思想,旨在优化和改进一种新型的基于Spark平台下的混合协同过滤。借鉴Stacking集成学习思想,将多个弱推荐器线性加权组合,形成综合性强的推荐器。算法基于近邻协同过滤,结合分类、流行度、好评度等对近邻相似度计算策略进行优化,旨在改善相似度的合理性以及相似度计算的复杂度,在一定程度上改善了评分稀疏性的问题;算法结合Spark分布式计算平台,充分借鉴分布式平台的优点,利用其流式处理以及分布式存储结构等特性,设计并实现一种推荐算法的增量迭型,解决了协同过滤算法扩展性和实时性问题。实验数据采用UCI公用数据集MovieLens和NetFlix电影评分数据。实验结果表明,改进算法在推荐个性化、准确率以及扩展性上都有不错的表现,较以前同类型算法均有不同程度的提高,为推荐系统的应用提供一种可行的算法集成方案。
关键词 集成学习;协同过滤;稀疏性;扩展性;Spark流式计算;增量模型;分类
基金项目 北京市教委科技计划项目(KM201310028014)
本文URL http://www.arocmag.com/article/01-2019-03-042.html
英文标题 New improvement and implementation of hybrid collaborative filtering algorithm based on Spark platform
作者英文名 Wang Yuanlong, Sun Weizhen, Xiang Yong
机构英文名 1.Dept.ofComputerScience&Technology,CollegeofInformationEngineering,CapitalNormalUniversity,Beijing100048,China;2.Dept.ofComputerScience&Technology,TsinghuaUniversity,Beijing100084,China
英文摘要 Aiming at optimizing and improving a hybrid collaborative filtering based on Spark platform for its sparsity, sca-lability and personalized recommendation by using the method of algorithm integration, this paper took the model of Stacking integration to integrate multiple weak recommender units in a linearly weighted into a comprehensive recommender.Firstly, this algorithm optimized the collaborative filtering based on the nearest neighbor by presorting and adjusting the similarity calculation strategy with popularity and praise degree, and improved the rationality and complexity of similarity calculation.It solved the problem of score sparsity to some extent.At the same time, this algorithm integrated closely distributed computing platform, which could make full use of the advantages of distributed platform to design and implement an increment iterative model of recommendation algorithm by using the Spark streaming and distributed storage structure.It solved the problem that collaborative filtering algorithm was hard to expand and made poor real-time performance.The experimental data used UCI public data set named MovieLens and NetFlix films’ score.The experimental results show that the improved algorithm has a good perfor-mance and makes great progress in personalized recommendation, accuracy and scalability compared with the previous algorithms.It provides a feasible algorithm integration scheme for the application of the recommended system.
英文关键词 integrated learning; collaborative filtering; sparsity; extensibility; Spark streaming; incremental model; classification
参考文献 查看稿件参考文献
  [1] Ricci F, Rokach L, Shapira B, et al. Recommender systems handbook[M] . New York:Springer, 2011:39-184.
[2] Cheung K W, Tian L F. Learning user similarity and ratings for collaborative recommendation[J] . Information Retrieval, 2004, 7(3-4):395-410.
[3] BalabanovicM, Shoham Y. Fab:content-based collaborative recommendation[J] . Communications of the ACM, 1997, 40(3):66-72.
[4] 王成, 朱志刚, 张玉侠, 等. 基于用户的协同过滤算法的推荐效率和个性化改进[J] . 小型微型计算机系统, 2016, 37(3):428-432. (Wang Cheng, Zhu Zhigang, Zhang Yuxia, et al. Improvement in recommendation efficiency and personalized of user-based collaborative filtering algorithm[J] . Journal of Chinese Computer Systems, 2016, 27(3):428-432. )
[5] 谭云志, 张敏, 刘奕群, 等. 基于用户评分和评论信息的协同推荐框架[J] . 模式识别与人工智能, 2016, 29(4):359-366. (Tan Yunzhi, Zhang Min, Liu Yiqun, et al. Collaborative recommendation framework based on ratings and textual reviews[J] . Pattern Recognition and Artificial Intelligence, 2016, 29(4):359-366. )
[6] 张宇, 程久军. 基于MapReduce的矩阵分解推荐算法研究[J] . 计算机科学, 2013, 40(1):19-23. (Zhang Yu, Cheng Jiujun. Study on recommendation algorithm with matrix factorization method based on MapReduce[J] . Computer Science, 2013, 40(1):19-23. )
[7] Koren Y. Factorization meets the neighborhood:a multifaceted collaborative filtering model[C] //Proc of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2008:426-434.
[8] Deshpande M, Karypis G. Item-based top-N recommendation algorithms[J] . ACM Trans on Information Systems, 2004, 22(1):143-177.
[9] Linden G, Simth B, York J. Amazon. com recommendations:item-toitem collaborative filteing[J] . IEEE Internet Computing, 2003, 7(1):76-80.
[10] 吴毅涛, 张兴明, 王兴茂, 等. 基于用户模糊相似度的协同过滤算法[J] . 通信学报, 2016, 37(1):198-206. (Wu Yitao, Zhang Xingming, Wang Xingmao, et al. User fuzzy similarity-based collaborative filtering recommendation algorithm[J] . Journal on Communications, 2016, 37(1):198-206. )
[11] 方耀宁, 郭云飞, 丁雪涛, 等. 一种基于局部结构的改进奇异值分解推荐算法[J] . 电子与信息学报, 2013, 35(6):1284-1289. (Fang Yaoning, Guo Yunfei, Ding Xuetao, et al. An improved singular value decomposition recommender algorithm based on local structures[J] . Journal of Electronics & Information Technology, 2013, 35(6):1284-1289. )
[12] 胡俊, 胡贤德, 程家兴. 基于Spark的大数据混合计算模型[J] . 计算机系统应用, 2015, 24(4):214-218. (Hu Jun, Hu Xiande, Chen Jiaxing. Big data hybrid computing mode based on Spark[J] . Computer System & Application, 2015, 24(4):214-218. )
[13] Apache Kafka. Kafka 2. 0 documentation[EB/OL] . [2017-10-23] . http://kafka. apache. org/documentation/#introduction.
[14] Apache Spark. Spark streaming programming guide[EB/OL] . [2017-10-23] . http://spark. apache. org/docs/latest/streaming-programming-guide. html.
[15] Apache HBase Team. Apache HBase reference guide[EB/OL] . [2017-10-25] . https://hbase. apache. org/book. html.
[16] 陈吉荣, 乐嘉锦. 基于Hadoop生态系统的大数据解决方案综述[J] . 计算机工程与科学, 2013, 35(10) :25-35. (Chen Jirong, Le Jiajin. Reviewing the big data solution based on Hadoop ecosystem[J] . Computer Engineering & Science, 2013, 35(10) :25-35. )
收稿日期 2017/10/11
修回日期 2017/12/4
页码 855-860
中图分类号 TP301.6
文献标志码 A