基于并行图计算的社区划分方法 - 计算机应用研究 编辑部 - 《计算机应用研究》唯一官方网站

《计算机应用研究》|Application Research of Computers

基于并行图计算的社区划分方法

Community partitioning method based on parallel graph calculation

免费全文下载 (已被下载 次)  
获取PDF全文
作者 谭敢锋,刘群
机构 重庆邮电大学 计算机科学与技术学院,重庆 400065
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2018)08-2265-05
DOI 10.3969/j.issn.1001-3695.2018.08.006
摘要 进行了以图计算形式划分社交网络社区的调查,在调查中发现如何提升图计算应用于大规模社交网络的计算速度和扩展性,一直是研究的难点。其中针对谱图分割社交网络社区划分进行了研究,为提高谱图划分精确性、解决图计算研究难点,提出了以改进谱图论为基础、并行为实验方式的社区划分方法。实验中利用三角模型,改进了谱图论中谱聚类算法相似权值矩阵构造方法,并结合分布式并行环境Spark实现了改进算法并行化。由于三角模型是基于网络结构实际相连关系,反映了网络节点之间的复杂关系,所以通过三角模型计算节点相似性更可靠。通过实际数据集结合并行方式分别与单机分析软件、其他并行算法进行对比实验,提出的并行化图计算方法能有效提升计算速度和扩展性,以及社区划分精确性,支持大规模社交网络的挖掘分析。
关键词 并行图计算;Spark;三角模型;谱聚类
基金项目 中国重庆研究生科研创新项目(CYS16161)
中国国家自然科学基金资助项目(61572091)
重庆自然科学基金资助项目(CSTC2014jcyjA40047)
CQUPT博士研究生创业项目((A2014)-20)
本文URL http://www.arocmag.com/article/01-2018-08-006.html
英文标题 Community partitioning method based on parallel graph calculation
作者英文名 Tan Ganfeng, Liu Qun
机构英文名 CollegeofComputerScience&Technology,ChongqingUniversityofPosts&Telecommunications,Chongqing400065,China
英文摘要 Division of social networking was surveyed in the form of graph calculation, in the survey found that how to improve the calculation and application of graph calculation in large scale social networking, has always been the difficulty. Which in view of the spectrogram segmentation social networking community division was studied, in order to improve the algorithm accuracy, resolve the computing research difficulties, this paper put forward the community partitioning method that based on improve spectrum graph theory, and the experiment based on parallel graph calculation. In the experiments used triangle model, it improved the spectrum graph spectral clustering algorithm in similar weight matrix, moreover combined distributed parallel environment Spark to implement the improved parallel algorithm. Due to the triangle model was based on the actual network structure of linked together, it reflected the complicate relationship between nodes of network, so through the triangular model to calculate the node similarity was more reliable. Through the actual data set, the parallel method was compared with the single analysis software and other parallel algorithm in the parallel condition. The parallelization graph calculation method can effectively improve the calculation speed and expansibility of algorithm, the accuracy of community partition and supports the mining of large-scale social network.
英文关键词 parallel graph calculation; Spark; triangular model; spectral clustering
参考文献 查看稿件参考文献
  [1] Xin R S, Gonzalez J E, Franklin M J, et al. GraphX:a resilient distributed graph system on Spark[C] //Proc of the 1st International Workshop on Graph Data Management Experiences and Systems. New York:ACM Press, 2013:Article No. 2.
[2] Han Zhijie, Zhang Yujie. Spark:a big data processing platform based on memory computing[C] //Proc of the 7th International Symposium on Parallel Architectures, Algorithms and Programming. Piscataway, NJ:IEEE Press, 2016:172-176.
[3] Bourdonov I B, Kossatchev A S. Parallel computations on a graph[J] . Programming & Computer Software, 2015, 41(1):1-13.
[4] Newman M E J, Girvan M. Finding and evaluating community structure in networks[J] . Physical Review E:Statistical Nonlinear & Soft Matter Physics, 2004, 69(2):026113.
[5] Ghoshal G, Zlatic' V, Caldarelli G, et al. Random hypergraphs and their applications[J] . Physical Review E:Statistical Nonlinear & Soft Matter Physics, 2009, 79(6):853-857.
[6] Alzate C, Suykens J A K. Sparse kernel spectral clustering models for large-scale data analysis[J] . Neurocomputing, 2011, 74(9):1382-1390.
[7] Heider F. The psychology of interpersonal relations[M] . [S. l. ] :Wiley, 1958.
[8] Antoine H, David A. Distributed graph layout with Spark[C] //Proc of International Conference on Information Visualisation. Washington DC:IEEE Computer Society, 2015:271-276.
[9] Ghazi M R, Gangodkar D. Hadoop, MapReduce and HDFS:a develo-pers perspective[J] . Procedia Computer Science, 2015, 48:45-50.
[10] Meila M, Shi Jianbo. A random walks view of spectral segmentation[C] // Proc of the 8th International Workshop on Artificial Intelligence and Statistics. 2001.
[11] Newman M E. Modularity and community structure in networks[J] . Proceedings of the National Academy of Sciences, 2006, 103(23):8577-8582.
[12] Lancichinetti A, Fortunato S. Community detection algorithms:a comparative analysis[J] . Physical Review E:Statistical Nonlinear & Soft Matter Physics, 2009, 80(2):056117.
[13] Cao Jiangzhong, Chen Pei, Dai Qingyun, et al. Local information-based fast approximate spectral clustering[J] . Pattern Recognition Letters, 2014, 38(1):63-69.
[14] Sakr S. Large-scale graph processing systems[M] //Big Data 2. 0 Processing Systems. [S. l. ] :Springer International Publishing, 2016:53-73.
[15] Benson A R, Gleich D F. Higher-order organization of complex networks[J] . Science, 2016, 353(6295):163-166.
收稿日期 2017/4/3
修回日期 2017/5/26
页码 2265-2269
中图分类号 TP301.6
文献标志码 A