《计算机应用研究》|Application Research of Computers

改进的并行关联规则增量挖掘算法

Improved parallel association rules incremental mining algorithm

免费全文下载 (已被下载 次)  
获取PDF全文
作者 毛伊敏,邓千虎,邓小鸿,刘蔚
机构 江西理工大学 a.信息工程学院;b.应用科学学院,江西 赣州 341000
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2021)10-015-2974-07
DOI 10.19734/j.issn.1001-3695.2021.03.0084
摘要 针对大数据环境下基于Can树(canonical order tree)的增量关联规则算法存在树结构空间占用过大、频繁模式挖掘效率不佳以及MapReduce集群并行化性能不足等问题,提出了一种基于粗糙集和归并剪枝方法改进的并行关联规则增量挖掘算法MR-PARIRM(MapReduce-based parallel association rules incremental mining algorithm using rough set and merge pruning)。首先,设计了一种基于粗糙集的相似项合并策略RS-SIM(rough set based similar item merge)对数据集的相似项进行合并处理,并根据合并后的数据进行Can树构造,从而降低树结构的空间占用;其次,提出了一种归并剪枝策略MPS(merge pruning strategy)对树结构中的传播路径进行修剪合并,通过压缩频繁模式搜索空间来加快频繁项挖掘;最后,通过动态调度策略DSS(dynamic scheduling strategy)对异构式MapReduce集群中的计算任务进行动态调度,实现了负载均衡,有效提升了集群的并行化运算能力。最终的实验仿真结果表明,MR-PARIRM在大数据环境下具有相对较好的性能表现,适用于对大规模数据进行并行化处理。
关键词 Can树; 粗糙集; 归并剪枝; 大数据; 增量挖掘
基金项目 国家重点研发计划资助项目(2018YFC1504705)
国家自然科学基金资助项目(41562019,61762046)
江西省教育厅科技资助项目(GJJ209407)
本文URL http://www.arocmag.com/article/01-2021-10-015.html
英文标题 Improved parallel association rules incremental mining algorithm
作者英文名 Mao Yimin, Deng Qianhu, Deng Xiaohong, Liu Wei
机构英文名 a.School of Information Engineering,b.College of Applied Science,Jiangxi University of Science & Technology,Ganzhou Jiangxi 341000,China
英文摘要 In the big data environment, the Can-tree based on incremental association rule algorithm has problems such as too much space occupation of the tree structure, the efficiency of frequent pattern mining is poor, and the parallelization perfor-mance of MapReduce cluster is insufficient. Aiming at these problems, this paper proposed the MR-PARIRM. Firstly, it designed a RS-SIM to merge similar items in the dataset, and constructed Can-tree based on the merged data, thereby reducing the space occupation of the tree structure. Secondly, this paper proposed an MPS to prune and merge the propagation paths in the tree structure, thereby compressing the frequent pattern search space to speed up frequent item mining. Finally, MR-PARIRM used the DSS to dynamically schedule the computing tasks in the heterogeneous MapReduce cluster, thereby implementing the load balance and effectively improving the parallel computing capabilities of the cluster. The final experimental simulation results show that MR-PARIRM has relatively better performance in the big data environment and is suitable for parallel proces-sing of large-scale data.
英文关键词 Can-tree; rough set; merge pruning; big data; incremental mining
参考文献 查看稿件参考文献
 
收稿日期 2021/3/18
修回日期 2021/5/14
页码 2974-2980
中图分类号 TP311
文献标志码 A