《计算机应用研究》|Application Research of Computers

自适应布谷鸟搜索的并行K-means聚类算法

Parallel K-means clustering algorithm based on adaptive cuckoo search

免费全文下载 (已被下载 次)  
获取PDF全文
作者 王波,余相君
机构 重庆大学 计算机学院,重庆 400044
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2018)03-0675-05
DOI 10.3969/j.issn.1001-3695.2018.03.008
摘要 针对K-means聚类算法受初始类中心影响,聚类结果容易陷入局部最优导致聚类准确率较低的问题,提出了一种基于自适应布谷鸟搜索的K-means 聚类改进算法,并利用MapReduce编程模型实现了改进算法的并行化。通过搭建的Hadoop分布式计算平台对不同样本数据集分别进行10次准确性实验和效率实验,结果表明:a)聚类的平均准确率在实验所采用的四种UCI标准数据集上,相比原始K-means聚类算法和基于粒子群优化算法改进的K-means聚类算法都有所提高;b)聚类的平均运行效率在实验所采用的五种大小递增的随机数据集上,当数据量较大时,显著优于原始K-means串行算法,稍好于粒子群优化算法改进的并行K-means聚类算法。可以得出结论,在大数据情景下,应用该算法的聚类效果较好。
关键词 聚类;K-均值算法;布谷鸟搜索算法;Hadoop;MapReduce
基金项目 国家科技重大专项资助项目(2012ZX07-307-002)
本文URL http://www.arocmag.com/article/01-2018-03-008.html
英文标题 Parallel K-means clustering algorithm based on adaptive cuckoo search
作者英文名 Wang Bo, Yu Xiangjun
机构英文名 CollegeofComputerScience,ChongqingUniversity,Chongqing400044,China
英文摘要 The original K-means clustering algorithm is seriously affected by initial centroids of clustering and easy to fall into local optima.So this paper proposed an improved K-means clustering based on adaptive cuckoo search, and achieved the parallelization of the improved algorithm using MapReduce programming model. It implemented accuracy experiments and efficiency experiments 10 times respectively on Hadoop platform for every different data sets, the experimental results show that:a)compared with the original K-means algorithm and PSO-Kmeans, the average accuracy of clustering improved in the experiments which test on four UCI standard data sets;b)tested the average execution efficiency of clustering in the experiments which test on five random incremental data sets, when the amount of data was very large, significantly better than original K-means algorithm, slightly better than PSO-Kmeans. It can be concluded that the algorithm can be applied to large data clustering, and will play a significant effect.
英文关键词 clustering; K-means algorithm; cuckoo search algorithm; Hadoop; MapReduce
参考文献 查看稿件参考文献
  [1] Han Jiawei, Kamber M, Pei Jian, et al. Data mining concepts and techniques[M] . 3rd ed. San Francisco:Morgan Kaumann, 2011:451-456.
[2] 汪中, 刘贵全, 陈恩红. 一种优化初始中心点的K-means 算法[J] . 模式识别与人工智能, 2009, 22(2):299-304.
[3] 李春生, 王耀南. 聚类中心初始化的新方法[J] . 控制理论与应用, 2010, 27(10):1435-1440.
[4] 陶新民, 徐晶, 杨立标, 等. 一种改进的粒子群和K-均值混合聚类算法[J] . 电子与信息学报, 2010, 32(1):92-97.
[5] Lu Bin, Ju Fangyuan. An optimized genetic K-means clustering algorithm[C] //Proc of International Conference on Computer Science and Information Processing. Piscataway:IEEE Press, 2012:1296-1299.
[6] 马汉达, 郝晓宇, 马仁庆. 基于Hadoop的并行PSO-Kmeans算法实现Web日志挖掘[J] . 计算机科学, 2015, 42(s1):470-473.
[7] Yang Xinshe, Deb S. Cuckoo search via Lévy flights[C] //Proc of World Congress on Nature & Biologically Inspired Computing. 2009:210-214.
[8] 郑洪清, 周永权. 一种自适应步长布谷鸟搜索算法[J] . 计算机工程与应用, 2013, 49(10):68-71.
[9] Raveendra. DE based job scheduling in grid enviroments[J] . Journal of Computer Networks, 2013, 1(2):28-31.
[10] 陈乐, 龙文. 求解工程结构优化问题的改进布谷鸟搜索算法[J] . 计算机应用研究, 2014, 31(3):679-683.
[11] Fister I, Yang Xinshe, Fister D, et al. Cuckoo search:a brief literature review[M] //Cuckoo Search and Firefly Algorithm, Volume 516 of the Series Studies in Computational Intelligence. Berlin:Springer-Verlag, 2013:49-62.
[12] Yang Xinshe, Deb S. Cuckoo search:recent advances and applications[J] . Neural Computing and Applications, 2014, 24(1):169-174.
[13] 欧阳喆, 周永权. 自适应步长萤火虫优化算法[J] . 计算机应用, 2011, 31(7):1804-1807.
[14] White T. Hadoop:the definitive guide[M] . 4th ed. Sebastopol:O’Reilly Media, 2015:3-15.
[15] Ghemawat S, Gobioff H, Leung S T. The Google file system[C] //Proc of the 19th ACM Symposium on Operating Systems Principles. 2003:19-43.
[16] Dean J, Ghemawat S. MapReduce:simplified data processing on large clusters[C] //Proc of Conference on Symposium on Opearting Systems Design & Implementation. 2004:107-113.
[17] Chang F, Dean J, Ghemawat S, et al. Bigtable:a distributed storage system for structured data[C] //Proc of USENIX Symposium on Ope-rating Systems Design and Implementation. [S. l. ] :USENIX Association, 2006:15.
[18] 喻金平, 郑杰, 梅宏标. 基于改进人工蜂群算法的K-均值聚类算法[J] . 计算机应用, 2014, 34(4):1065-1069, 1088.
[19] 杨辉华, 王克, 李灵巧, 等. 基于自适应布谷鸟搜索算法的K-means 聚类算法及其应用[J] . 计算机应用, 2016, 36(8):2066-2070.
[20] 周婷, 张君瑛, 罗成. 基于Hadoop的K-means聚类算法的实现[J] . 计算机技术与发展, 2013, 23(7):18-20.
收稿日期 2016/10/27
修回日期 2016/12/14
页码 675-679
中图分类号 TP301.6
文献标志码 A