英文标题 | Graph pattern mining probability algorithm for big data |
作者英文名 | Jiang Lili, Li Yefei, Dou Longlong, Chen Zhiqi, Qian Zhuzhong |
机构英文名 | 1.Jiangsu Frontier Electric Technology Co. Ltd. ,Nanjing 210000,China;2.Dept. of Computer Science & Technology,Nanjing University,Nanjing 210023,China |
英文摘要 | In today's big data era, big data processing frameworks such as MapReduce often appear slow and inefficient when processing data, specially related to graphs. Therefore, it is necessary to explore an efficient algorithm to handle this type of clique counting problem. Since the predecessor literatures have thoroughly explored the 3-clique counting, the extended version of the problem(the 4-clique counting problem) improves its position gradually. Under the guidance of a heuristic idea, this paper proposed a probability sampling algorithm based on neighboring edge sampling to solve the extended problem. With the usage of Chernoff inequality, the algorithm only needed a certain number of samplers as the performance guarantee of relative error under the approximate condition. Later, the experimental evaluation and comparison shows that the probability sampling algorithm loses a small amount of precision compared with the traditional precision algorithm, but it has great advantages in algorithm running time and space occupation. Finally, it comes to the conclusion that it has great practical value in practical applications. |
英文关键词 | 4-clique counting problem; probability algorithm; graph pattern mining; big data processing; approximation algorithm |