《计算机应用研究》|Application Research of Computers

基于Spark GraphX和社交网络大数据的用户影响力分析

Analysis of user influence based on social network big data and Spark GraphX

免费全文下载 (已被下载 次)  
获取PDF全文
作者 文馨,陈能成,肖长江
机构 武汉大学 a.测绘遥感信息工程国家重点实验室;b.地球空间信息技术协同创新中心,武汉 430079
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2018)03-0830-05
DOI 10.3969/j.issn.1001-3695.2018.03.039
摘要 利用社交网络大数据进行用户影响力分析,有助于识别网络环境中影响力强的用户实现其社会和商业价值。传统方法无法高效处理海量社交网络数据,定量准确地分析用户影响力,为解决该问题,提出一种基于PageRank算法的改进的用户影响力评价模型。综合考虑了用户连接程度和活跃程度,并以支持大规模并行图计算的Spark GraphX为工具,快速高效地实现了微博用户影响力的定量分析与评价。实验结果表明,所提方法效率更高,得到的用户影响力结果更接近真实情况。
关键词 数据挖掘;社交网络大数据;SparkGraphX;用户影响力分析
基金项目 湖北省自然科学基金创新群体项目(2016CFA003)
国家自然科学基金资助项目(41301441)
国家“863”计划资助项目(2013AA01A608)
本文URL http://www.arocmag.com/article/01-2018-03-039.html
英文标题 Analysis of user influence based on social network big data and Spark GraphX
作者英文名 Wen Xin, Chen Nengcheng, Xiao Changjiang
机构英文名 a.StateKeyLaboratoryofInformationEngineeringinSurveying,Mapping&RemoteSensing,b.CollaborativeInnovationCenterofGeospatialTechnology,WuhanUniversity,Wuhan430079,China
英文摘要 To analyze user influence based on big data from social network is helpful for recognizing users with good impact on the Internet and realizing their social and economic value. Traditional methods can not process massive social network data efficiently and analyze user influence quantitatively and precisely. To solve these problems, this paper proposed an advanced model of user influence evaluation, originating from classic PageRank algorithm, which took not only user connectivity but activity into consideration, and used Spark GraphX which supported massive parallel computing as a tool and realized analyzing influence of Weibo users quantitatively and precisely. Experiment shows that the approach proposed in this paper is a more efficient method with more precise results.
英文关键词 data mining; big data from social network; Spark GraphX; analysis of user influence
参考文献 查看稿件参考文献
  [1] 刘文远, 李少雄, 王晓敏, 等. 大数据知识发现[J] . 燕山大学学报, 2014, 38(5):377-380.
[2] Richardson M, Domingos P. Mining knowledge-sharing sites for viral marketings[C] //Proc of the 8th ACM SIGKDD International Confe-rence on Knowledge Diseoveryand Data Mining. New York:ACM Press, 2002:61-70.
[3] Kempe D, Kleinberg J, Tardos E. Maximizing the spread of influence through a social networks[C] //Proc of the 9th ACM SIGKDD International Confe-rence on Knowledge Discovery and Data Mining. New York:ACM Press, 2003:137-146.
[4] 田家堂, 王轶彤, 冯小军, 等. 一种新型的社会网络影响最大化算法[J] . 计算机学报, 2011, 34(10):1956-1965.
[5] Cha M, Haddad1 H, Benevenuto F, et a1. Measuring user influence in Twitter:the million follower fallacy[C] //Proc of the 4th International Conference on Weblogs and Social Media. Washington DC:AAAI Press, 2010:10-17.
[6] 郭浩, 陆余良, 王宇, 等. 基于信息传播的微博用户影响力度量[J] . 山东大学学报:理学版, 2012, 47(5):78-83.
[7] Weng Jianshu, Lim E P, Jiang Jing, et a1. TwitterRank:finding to-pic sensitive influential Twitterers[C] //Proc of the 3rd ACM International Conference on Web Search and Data Mining. New York:ACM Press, 2010:261-270.
[8] 王琛, 陈庶樵. 一种改进的微博用户影响力评价算法[J] . 信息工程大学学报, 2013, 14(3):380-384.
[9] Narayanam R, Narahari Y. A shapley value-based approach to discover influential nodes in social networks[J] . IEEE Trans on Automation Science and Engineering, 2011, 8(1):130-147.
[10] Kimura M, Saito K, Nakano R, et a1. Extracting influential nodes on a social network for information diffusion[J] . Data Mining and Knowledge Discovery, 2010, 20(1):70-97.
[11] 吴凯, 季新生, 郭进时, 等. 基于微博网络的影响力最大化算法[J] . 计算机应用, 2013, 33(8):2091-2094.
[12] 马俊, 周刚, 许斌, 等. 基于个人属性特征的微博用户影响力分析[J] . 计算机应用研究, 2013, 30(8):2483-2487.
[13] Cox D R. Regression models and life-tables[J] . Journal of the Royal Statistical Society, 1972, 34(2):187-220.
收稿日期 2016/11/20
修回日期 2017/1/11
页码 830-834
中图分类号 TP391
文献标志码 A