《计算机应用研究》|Application Research of Computers

引入信息熵的CURE聚类算法

Modified CURE clustering algorithm based on entropy

免费全文下载 (已被下载 次)  
获取PDF全文
作者 伍恒,李文杰,蒋旻
机构 武汉科技大学 a.计算机科学与技术学院;b.智能信息处理与实时工业系统湖北省重点实验室,武汉 430065
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2017)08-2303-03
DOI 10.3969/j.issn.1001-3695.2017.08.014
摘要 为了提高传统CURE(clustering using representatives)聚类算法的质量,引入信息熵对其进行改进。该算法使用K-means算法对样本数据集进行预聚类;采用基于信息熵的相似性度量,利用簇中元素提供的信息度量不同簇之间的相互关系,并描述数据的分布;在高、低层聚类阶段,采取不同的选取策略,分别选取相应的代表点。在UCI和人造数据集上的实验结果表明,提出的算法在一定程度上提高了聚类的准确率,且在大型数据集上比传统CURE算法有着更高的聚类效率。
关键词 层次聚类;CURE算法;信息熵;代表点选取
基金项目 国家自然科学基金资助项目(41571396)
本文URL http://www.arocmag.com/article/01-2017-08-014.html
英文标题 Modified CURE clustering algorithm based on entropy
作者英文名 Wu Heng, Li Wenjie, Jiang Min
机构英文名 a.CollegeofComputerScience&Technology,b.HubeiProvinceKeyLaboratoryofIntelligentInformationProcessing&RealtimeIndustrialSystem,WuhanUniversityofScience&Technology,Wuhan430065,China
英文摘要 In order to improve the clustering quality of the traditional CURE algorithm, this paper proposed a modified CURE algorithm based on entropy. Firstly, this algorithm adopted K-means algorithm to cluster the sample data sets. Then, it introduced a similarity metric based on entropy to measure the relationship between clusters, this metric gathered information contained in the elements of the data sets, also described the distribution. Finally, in the low and high level of the clustering stage, it employed different strategies on representative points selection. The results of experiments on UCI data sets and synthetic data sets indicate that the proposed algorithm achieves better precision to some extent, and it gets better efficiency than the original CURE algorithm on large data sets.
英文关键词 hierarchical clustering; CURE algorithm; entropy; representative points selection
参考文献 查看稿件参考文献
  [1] Han Jiawei, Kamber M. Data ming:concepts and techniques[M] . 3rd ed. Beijing:China Machine Press, 2012.
[2] Guha S, Rastogi R, Shim K. CURE:an efficient clustering algorithm for large database[C] //Proc ofACM SIGMOD International Conference on Management of Data. New York:ACM Press, 1998:73-84.
[3] Qian Yuntao, Shi Qingsong, Wang Qi. CURE-NS:a hierarchical clustering algorithm with new shrinking scheme[C] //Proc of the 1stInternational Conference on Machine Learning and Cybernetics. 2002:895-899.
[4] 沈洁, 赵雷, 杨季文, 等. 一种基于划分的层次聚类算法[J] . 计算机工程与应用, 2007, 43(31):175-177.
[5] 贾瑞玉, 耿锦威, 宁再早, 等. 基于代表点的快速聚类算法[J] . 计算机工程与应用, 2010, 46(33):121-123, 126.
[6] 郝洪星, 朱玉全, 陈耿, 等. 基于划分和层次的混合动态聚类算法[J] . 计算机应用研究, 2011, 28(1):51-53.
[7] Principe J C. Information theoretic learning:Renyi’s entropy and kernel perspectives[M] . [S. l. ] :Springer, 2010.
[8] Araujo D, Neto A D, Martins A. Comparative study on information theoretic clustering and classical clustering algorithms[C] //Proc of the 22nd International Conference on Artificial Neural Networks & Machine Learning. 2012:459-466.
[9] Dash M, Petrutiu S, Scheuermann P. pPOP:Fast yet accurate parallel hierarchical clustering using partitioning[J] . Data & Knowledge Engineering, 2007, 61(3):563-578.
[10] Lichman M. UCI machine learning repository[EB/OL] . (2013)[2016-03-30] . http://archive. ics. uci. edu/ml.
收稿日期 2016/5/31
修回日期 2016/7/8
页码 2303-2305
中图分类号 TP301.6
文献标志码 A