《计算机应用研究》|Application Research of Computers

基于MapReduce的top-k高效用模式挖掘算法

Top-k high utility pattern mining algorithm based on MapReduce

免费全文下载 (已被下载 次)  
获取PDF全文
作者 吴倩,王林平,罗相洲,崔建群,王海
机构 华中师范大学 a.计算机学院;b.科技处,武汉 430079
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2017)10-2897-04
DOI 10.3969/j.issn.1001-3695.2017.10.004
摘要 高效用模式挖掘被广泛地应用于数据挖掘领域。为了挖掘指定数量的高效用模式,一些基于树结构和效用表结构的top-k高效用挖掘算法被提出,但前者在挖掘过程中产生了大量候选模式,后者在效用模式增长时需要进行多次比较;同时,由于在信息社会,数据量呈爆炸性增长,所以在数据集过大的情况下,挖掘高效用模式需以大量存储空间以及计算开销为代价。为了解决这两个问题,基于MapReduce的top-k高效用模式挖掘算法(TKHUP_MaR)被提出。该算法通过两次扫描数据库,利用三次MapReduce来实现并行top-k高效用模式的挖掘。通过实验表明TKHUP_MaR 算法在并行挖掘top-k高效用模式的过程中是有效的。
关键词 数据挖掘;top-k;高效用模式;MapReduce;并行算法
基金项目 国家自然科学基金资助项目(61370108)
本文URL http://www.arocmag.com/article/01-2017-10-004.html
英文标题 Top-k high utility pattern mining algorithm based on MapReduce
作者英文名 Wu Qian, Wang Linping, Luo Xiangzhou, Cui Jianqun, Wang Hai
机构英文名 a.SchoolofComputer,b.Dept.ofScience&Technology,CentralChinaNormalUniversity,Wuhan430079,China
英文摘要 High utility pattern mining has been widely applied in the field of data mining. Some top-k high utility pattern mi-ning algorithms based on tree-like and list-like structures were proposed. However, tree-like algorithms generated a large number of candidates, and comparing operation was costly during the process of utility pattern growth in list-like algorithms. In addition, the amount of information data increased exponentially in information society. Thus, it required memory usage and computational cost in mining process, especially the dataset size was huge. In order to address above issues, this paper proposed top-k high utility pattern mining algorithm based on MapReduce, called TKHUP_MaR. TKHUP_MaR needed to scan database twice and used three MapReduce phases to parallelize top-k high utility pattern mining. The experiment results show that TKHUP_MaR is effective in the process of mining top-k high utility patterns on parallel environment.
英文关键词 data mining; top-k; high utility pattern; MapReduce; parallel algorithm
参考文献 查看稿件参考文献
  [1] Tseng V S, Wu Chengwei, Shie B E, et al. UP-growth:an efficient algorithm for high utility itemset mining[C] //Proc of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2010:253-262.
[2] Liu Mengchi, Qu Junfeng. Mining high utility itemsets without candidate generation[C] //Proc of the 21st ACM International Conference on Information and Knowledge Management. New York:ACM Press, 2012:55-64.
[3] Fournier-Viger P, Wu Chengwei, Zida S, et al. FHM:faster highutility itemset mining using estimated utility co-occurrence pruning[M] //Foundations of Intelligent Systems. [S. l. ] :Springer International Publishing, 2014:83-92.
[4] Zida S, Fournier-Viger P, Lin Chunwei, et al. EFIM:a highly efficient algorithm for high-utility itemset mining[C] //Advances in Artificial Intelligence and Soft Computing. [S. l. ] :Springer International Publishing, 2015:530-546.
[5] Wu Chengwei, Shie B E, Tseng V S, et al. Mining top-k high utility itemsets[C] //Proc of the 18th ACM SIGKDD International Confe-rence on Knowledge Discovery and Data Mining. New York:ACM Press, 2012:78-86.
[6] Ryang H, Yun U. Top-k high utility pattern mining with effective threshold raising strategies[J] . Knowledge-Based Systems, 2015, 76(1):109-126.
[7] Tseng V S, Wu Chengwei, Fournier-Viger P, et al. Efficient algorithms for mining top-k high utility itemsets[J] . IEEE Trans on Knowledge and Data Engineering, 2016, 28(1):54-67.
[8] Dean J, Ghemawat S. MapReduce:simplified data processing on large clusters[J] . Communications of the ACM, 2008, 51(1):107-113.
[9] Li Haoyuan, Wang Yi, Zhang Dong, et al. PFP:parallel FP-growth for query recommendation[C] //Proc of ACM Conference on Recommender Systems. New York:ACM Press, 2008:107-114.
[10] 陈光鹏, 杨育彬, 高阳, 等. 一种基于MapReduce的频繁闭项集挖掘算法[J] . 模式识别与人工智能, 2012, 25(2):220-224.
[11] 杨勇, 高松松. 基于MapReduce的关联规则并行增量更新算法[J] . 重庆邮电大学学报:自然科学版, 2014, 26(5):670-678.
[12] 唐颖峰, 陈世平. 一种基于后缀项表的并行闭频繁项集挖掘算法[J] . 计算机应用研究, 2014, 31(2):373-377.
收稿日期 2016/7/17
修回日期 2016/8/29
页码 2897-2900,2932
中图分类号 TP301.6
文献标志码 A