《计算机应用研究》|Application Research of Computers

面向大数据的图模式挖掘概率算法

Graph pattern mining probability algorithm for big data

免费全文下载 (已被下载 次)  
获取PDF全文
作者 姜丽丽,李叶飞,豆龙龙,陈智麒,钱柱中
机构 1.江苏方天电力科技有限公司,南京 210000;2.南京大学 计算机科学与技术系,南京 210023
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2020)12-004-3545-07
DOI 10.19734/j.issn.1001-3695.2019.09.0539
摘要 在当今大数据时代,MapReduce等大数据处理框架处理数据能力有限,其在处理有关图的数据时常常显得缓慢低效,典型如3-clique计数问题,故需要探究一种高效的算法处理这类clique计数问题。由于在前人文献中对3-clique计数问题已有深入探讨,故针对该问题的扩展版本(4-clique计数问题)进行探究。在一个启发式的想法下提出了基于邻边采样的概率采样算法,利用切诺夫不等式证明该算法在近似条件下只需要一定数量的采样器作为相对误差的性能保证。通过实验评估对比发现,相对于传统精确算法,概率采样算法虽然在结果上损失了少量的精度,但在算法运行时间和空间占用上具有巨大的优势。最后得出其在实际应用中具有巨大实践价值的结论。
关键词 4-clique计数问题; 概率化算法; 图模式挖掘; 大数据处理; 近似算法
基金项目 国家自然科学基金面上项目(61872175)
江苏省自然科学基金面上项目(BK20181252)
本文URL http://www.arocmag.com/article/01-2020-12-004.html
英文标题 Graph pattern mining probability algorithm for big data
作者英文名 Jiang Lili, Li Yefei, Dou Longlong, Chen Zhiqi, Qian Zhuzhong
机构英文名 1.Jiangsu Frontier Electric Technology Co. Ltd. ,Nanjing 210000,China;2.Dept. of Computer Science & Technology,Nanjing University,Nanjing 210023,China
英文摘要 In today's big data era, big data processing frameworks such as MapReduce often appear slow and inefficient when processing data, specially related to graphs. Therefore, it is necessary to explore an efficient algorithm to handle this type of clique counting problem. Since the predecessor literatures have thoroughly explored the 3-clique counting, the extended version of the problem(the 4-clique counting problem) improves its position gradually. Under the guidance of a heuristic idea, this paper proposed a probability sampling algorithm based on neighboring edge sampling to solve the extended problem. With the usage of Chernoff inequality, the algorithm only needed a certain number of samplers as the performance guarantee of relative error under the approximate condition. Later, the experimental evaluation and comparison shows that the probability sampling algorithm loses a small amount of precision compared with the traditional precision algorithm, but it has great advantages in algorithm running time and space occupation. Finally, it comes to the conclusion that it has great practical value in practical applications.
英文关键词 4-clique counting problem; probability algorithm; graph pattern mining; big data processing; approximation algorithm
参考文献 查看稿件参考文献
 
收稿日期 2019/9/21
修回日期 2019/11/15
页码 3545-3551
中图分类号 TP316.4
文献标志码 A