《计算机应用研究》|Application Research of Computers

基于MapReduce计算模型的并行关联规则挖掘算法研究综述

Parallel association rules mining algorithm based on MapReduce: a survey

免费全文下载 (已被下载 次)  
获取PDF全文
作者 肖文,胡娟,周晓峰
机构 1.河海大学文天学院 电气信息工程系,安徽 马鞍山 243031;2.河南大学 计算机与信息学院,南京 210098
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2018)01-0013-11
DOI 10.3969/j.issn.1001-3695.2018.01.003
摘要 随着数据的爆炸式增长,传统的算法已不能适应大数据挖掘的需要,需要分布式、并行的关联规则挖掘算法来解决上述问题。MapReduce是一种流行的分布式并行计算模型,因其使用简单、伸缩性好、自动负载均衡和自动容错等优点,得到了广泛的应用。对已有的基于MapReduce计算模型的并行关联规则挖掘算法进行了分类和综述,对其各自的优缺点和适用范围进行了总结,并对下一步的研究进行了展望。
关键词 数据挖掘;关联规则挖掘;频繁项集;并行;MapReduce;Hadoop
基金项目 安徽省高校自然科学研究项目(KJ2016A623)
本文URL http://www.arocmag.com/article/01-2018-01-003.html
英文标题 Parallel association rules mining algorithm based on MapReduce: a survey
作者英文名 Xiao Wen, Hu Juan, Zhou Xiaofeng
机构英文名 1.Dept.ofElectricalInformationEngineering,HohaiUniversityWentianCollege,MaanshanAnhui243031,China;2.SchoolofComputer&Information,HohaiUniversity,Nanjing210098,China
英文摘要 With the explosive growth of data, traditional algorithms couldn’t meet the needs of the large data mining, it needed distributed parallel algorithm for mining association rules to solve the problem of mining association rules in large data.Map-Reduce was a kind of popular distributed parallel computing model, because of its simple to use, good scalability, the advantages of automatic load balancing and fault tolerance, had been widely used. This paper classified and reviewed the existing parallel algorithm for association rules minging based on MapReduce, summarized their respective advantages and disadvantages and scope of application, and prospected the next research.
英文关键词 data mining; association rules mining; frequent itemset; parallel; MapReduce; Hadoop
参考文献 查看稿件参考文献
  [1] Agrawal R, Imielinski T. Mining association rules between sets of items in large databases[J] . ACM SIGMOD Record, 1999, 22(2):207-216.
[2] Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases[C] //Proc of International Conference on Very Large Data Bases. San Francisco:Morgan Kaufmann Publishers Inc, 1994:487-499.
[3] Park J S, Chen M S, Yu P S. Using a hash-based method with transa-ction trimming for mining association rules[J] . IEEE Trans on Knowledge & Data Engineering, 1997, 9(5):813-825.
[4] Gvenir H A. An algorithm for mining association rules using perfect hashing and database pruning[C] //Proc of Turkish Symposium on Artificial Intelligence and Neural Networks. 2001.
[5] Brin S, Motwani R, Ullman J D, et al. Dynamic itemset counting and implication rules for market basket data[J] . ACM SIGMOD Record, 2001, 26(2):255-264.
[6] Han Jiawei, Pei Jian, Yin Yiwen, et al. Mining frequent patterns without candidate generation:a frequent-pattern tree approach[J] . Data Mining & Knowledge Discovery, 2015, 8(1):53-87.
[7] Pyun G, Yun U, Ryu K H. Efficient frequent pattern mining based on linear prefix tree[J] . Knowledge-Based Systems, 2014, 55:125-139.
[8] Tsay Y J, Hsu T J, Yu J R. FIUT:a new method for mining frequent itemsets[J] . Information Sciences, 2009, 179(11):1724-1737.
[9] Lin K C, Liao I E, Chen Zhisheng. An improved frequent pattern growth method for mining association rules[J] . Expert Systems with Applications, 2011, 38(5):5154-5161.
[10] Tseng F C. An adaptive approach to mining frequent itemsets efficiently[J] . Expert Systems with Applications, 2012, 39(18):13166-13172.
[11] Apache Hadoop[EB/OL] . http://hadoop. apache. org/.
[12] Zaki M J, Parthasarathy S, Ogihara M, et al. New algorithms for fast discovery of association rules[C] //Proc of International Conference on Knowledge Discovery & Data Mining. 1999:283-286.
[13] Dean J, Ghemawat S. MapReduce:simpli_ed data processing on large clusters[C] //Proc of the 6th Symposium on Operating System Design and Implementation. 2004.
[14] Li Ning, Zeng Li, He Qing, et al. Parallel implementation of apriori algorithm based on MapReduce[C] //Proc of ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing. [S. l. ] :IEEE Press, 2012:236-241.
[15] 黄立勤, 柳燕煌. 基于MapReduce并行的Apriori算法改进研究[J] . 福州大学学报:自然科学版, 2011, 39(5):680-685.
[16] 孙赵旭, 谢晓兰, 周国清, 等. 基于Hadoop的Apriori算法与实现[J] . 桂林理工大学学报, 2014, 34(3):584-588.
[17] Zhou Xinhao, Huang Yongfeng. An improved parallel association rules algorithm based on mapreduce framework for big data[C] //Proc of the 10th International Conference on Natural Computation. 2014:284-288.
[18] 周国军, 龚榆桐. 基于MapReduce和矩阵的频繁项集挖掘算法[J] . 微电子学与计算机, 2016, 33(5):119-123.
[19] Li Lingjuan, Zhang Min. The strategy of mining association rule based on cloud computing[C] //Proc of International Conference on Business Computing and Global Informatization. 2011:475-478.
[20] Yu Runming, Lee M G, Huang Yuanshao, et al. An efficient frequent patterns mining algorithm based on MapReduce framework[C] //Proc of International Conference on Software Intelligence Technologies and Appliations. 2014.
[21] Mao Weijun, Guo Weibin. An improved association rules mining algorithm based on power set and Hadoop[C] //Proc of International Conference on Information Science and Cloud Computing Companion. [S. l. ] :IEEE Computer Society, 2013.
[22] Liu Shenghui, Liu Shijia, Chen Shixuan, et al. IOMRA :a high efficiency frequent itemset mining algorithm based on the MapReduce computation model[C] //Proc of IEEE International Conference on Computational Science and Engineering. 2014:1290-1295.
[23] Yahya O, Hegazy O, Ezat E. An efficient implementation of Apriori algorithm based on Hadoop-MapReduce model[J] . International Journal of Reviews in Computing, 2012, 12:59-67.
[24] Farzanyar Z, Cercone N. Efficient mining of frequent itemsets in social network data based on MapReduce framework[C] //Proc of International Conference on Advances in Social Networks Analysis and Mining. [S. l. ] :IEEE Computer Society, 2013:1183-1188.
[25] Farzanyar Z, Cercone N. Accelerating frequent itemsets mining on the cloud:a MapReduce-based approach[C] //Proc of International Conference on Data Mining. 2013:592-598.
[26] Huang Dachuan, Song Yang, Routray R, et al. Smart cache:an optimized mapreduce implementation of frequent itemset mining[C] //Proc of IEEE International Conference on Cloud Engineering. 2015:16-25.
[27] 谢志明, 王鹏. 一种基于MapReduce架构的并行矩阵Apriori算法[J] . 计算机应用研究, 2017, 34(2):401-404.
[28] Pavón J, Viana S, Gomez S. Matrix apriori:speeding up the search for frequent patterns[C] //Proc of Iasted International Conference on Databases and Applications. 2006:75-82.
[29] Li Haoyuan, Wang Yi, Zhang Dong, et al. PFP:parallel FP-Growth for query recommendation[C] //Proc of ACM Conference on Recommender Systems. 2008:107-114.
[30] 杨勇, 王伟. 一种基于MapReduce的并行FP-Growth算法[J] . 重庆邮电大学学报:自然科学版, 2013, 25(5):651-657.
[31] Zhou Le, Zhong Zhiyong, Chang Jin, et al. Balanced parallel FP-Growth with MapReduce[C] //Proc of IEEE Youth Conference on Information Computing and Telecommunications. 2010:243-246.
[32] Wang Yong, Zhang Zhe, Wang Fang. A parallel algorithm of association rules based on cloud computing[C] //Proc of International ICST Conference on Communications and Networking in China. 2013:415-419.
[33] 陈兴蜀, 张帅, 童浩, 等. 基于布尔矩阵和MapReduce的FP-Growth算法[J] . 华南理工大学学报:自然科学版, 2014, 42(1):135-141.
[34] Xun Yaling, Zhang Jifu, Qin Xiao. FiDoop:parallel mining of frequent itemsets using MapReduce[J] . IEEE Trans on Systems Man & Cybernetics Systems, 2016, 46(3):313-325.
[35] Tsay Y J, Hsu T J, Yu J R. FIUT:a new method for mining frequent itemsets[J] . Information Sciences, 2009, 179(11):1724-1737.
[36] Liang Y H, Wu S Y. Sequence-Growth:a scalable and effective frequent itemset mining algorithm for big data based on MapReduce framework[C] //Proc of IEEE International Congress on Big Data. New York:IEEE Press, 2015.
[37] Yan Xifeng, Han Jiawei, Afshar R. CloSpan:mining closed sequential patterns in large databases[C] //Proc of SIAM International Conference on Data Mining. 2003
[38] Zhang Zhigang, Ji Genlin, Tang Mengmeng. MREclat:an algorithm for parallel mining frequent itemsets[C] //Proc of International Conference on Advanced Cloud and Big Data. 2013:177-180.
[39] Moens S, Aksehirli E, Goethals B. Frequent itemset mining for big data[C] //Proc of IEEE International Conference on Big Data. 2013:111-118.
[40] 张春, 汲磊举. 基于MapReduce的Eclat改进算法研究与应用[J] . 北京交通大学学报, 2016, 40(3):1-6.
[41] Gole S, Tidke B. Frequent itemset mining for big data in social media using ClustBigFIM algorithm[C] //Proc of International Conference on Pervasive Computing. [S. l. ] :IEEE Press. 2015:1-6.
[42] Zhao Weizhong, Ma Huifang, He Qing. Parallel K-means clustering based on MapReduce[C] //Proc of International Conference on Cloud Computing. Berlin:Springer-Verlag, 2009:674-679.
[43] Savasere A, Omiecinski E, Navathe S B. An efficient algorithm for mining association rules in large databases[C] //Proc of International Conference on Very Large Data Bases. San Francisco:Morgan Kaufmann Publishers Inc, 1995:432-444.
[44] Xiao Tao, Yuan Chunfeng, Huang Yihua. PSON:a parallelized SON algorithm with MapReduce for mining frequent sets[C] //Proc of International Symposium on Parallel Architectures. [S. l. ] :IEEE Computer Society, 2011:252-257.
[45] 郭进伟, 皮建勇. 基于MapReduce的SON算法实现[J] . 计算机应用, 2014, 34(S1):100-102.
[46] Natarajan S, Sehar S. A novel algorithm for distributed data mining in HDFS[C] //Proc of the 5th International Conference on Advanced Computing. 2013.
[47] Liao Jinggui, Zhao Yuelong, Long Saiqin. MRPrePost:a parallel algorithm adapted for mining big data[C] //Proc of IEEE Workshop on Electronics, Computer and Applications. 2014:564-568.
[48] Deng Zhihong, Wang Zhonghui, Jiang Jiajian. A new algorithm for fast mining frequent itemsets using N-lists[J] . Sciece China Information Sciences, 2012, 55(9):2008-2030.
[49] Pei Jian, Han Jiawei, Lu Hongjun, et al. H-mine:fast and space-preserving frequent pattern mining in large databases[J] . IIE Tran-sactions, 2007, 39(6):593-605.
[50] Pasquier N, Bastide Y, Taouil R, et al. Discovering frequent closed itemsets for association rules[C] // Proc of International Conference Database Theory. 1999:398-416.
[51] Han Jiawei, Pei Jian, Yin Yiwen. et al. Mining frequent pattern without candidate generation[C] //Proc of ACM SIGMOD International Conference on Management of Data. 2003.
[52] Wang Jianyong, Han Jiawei, Pei Jian. CLOSET+:searching for the best strategies for mining frequent closed itemsets[C] //Proc of Internationall Conference Knowledge Discovery and Data Mining. 2003:236-245.
[53] Grahne G, Zhu Jianfei. Efficiently using prefix-trees in mining frequent itemsets[C] //Proc of IEEE ICDM Workshop on Frequent Itemset Mining Implementations. 2003.
[54] Zaki M J, Hsiao C. Charm:an efficient algorithm for closed itemset mining[C] //Proc of SIAM International Conference Data Mining. 2002:57-473.
[55] Lucchese C, Orlando S. Perego R. Fast and memory efficient mining of frequent closed itemsets[J] . IEEE Trans on Knowledge and Data Engineering, 2006, 18(1):21-35. [56] Cheung D W, Han Jia, Ng V T, et al. Maintenance of discovered association rules in large databases:an incremental updating technique[C] //Proc of the 12th IEEE International Conference on Data Engineering. 1996:106-114.
[57] Cheung D W, Lee S D, Kao B. A general incremental technique for maintaining discovered association rules[C] //Proc of the 5th International Conference on Database for Advanced Applications. 1997:185-194.
[58] Ayan N F, Tansel A U, Arkun E. An efficient algorithm to update large itemsets with early pruning[C] //Proc of SIGKDD. 1999:287-291.
[59] Srikant R, Agrawal R. Mining sequential patterns:generalizations and performance improvements[C] //Proc of the 5th Conference on Extending Database Technology. 1996:3-17.
[60] Hong T P, Wang C Y, Tao Yuhui. A new incremental data mining algorithm using pre-large itemsets[J] . Intelligent Data Analysis, 2001, 5(2):111-129.
[61] Koh J L, Shieh S F. An efficient approach for maintaining association rules based on adjusting FP-Tree structures[C] //Proc of Database Systems for Advanced Applications. 2004:417-424.
[62] Cheung W, Zaane O R. Incremental mining of frequent-patterns without candidate gneration or support constraint[C] //Proc of International Database Engineering and Applications Symposium. 2003:111-116.
[63] Leung C K S, Khan Q I, Hoque T, et al. CanTree:a tree structure for efficient incremental mining of frequent patterns[C] //Proc of the 5th IEEE InternationalConference on Data Mining. 2005.
[64] Totad S G, Geeta R B, Reddy P P. Batch processing for incremental FP-tree construction[J] . International Journal of Computer Applications, 2011, 5(5):28-32.
[65] Han Jiawei, Kamber M, Pei Jian, et al. 数据挖掘:概念与技术[M] . 范明, 孟小峰译. 3版. 北京:机械工业出版社, 2012:186-188.
收稿日期 2016/12/13
修回日期 2017/2/9
页码 13-23
中图分类号 TP301.6
文献标志码 A