《计算机应用研究》|Application Research of Computers

基于加权K近邻的改进密度峰值聚类算法

Improved density peak clustering algorithm based on weighted K-nearest neighbor

免费全文下载 (已被下载 次)  
获取PDF全文
作者 杨震,王红军
机构 国防科技大学,合肥 230037
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2020)03-006-0667-05
DOI 10.19734/j.issn.1001-3695.2018.08.0656
摘要 密度峰值聚类算法是一种新颖的密度聚类算法,但是原算法仅仅考虑了数据的全局结构,在对分布不均匀的数据集进行聚类时效果不理想,并且原算法仅仅依据决策图上各点的分布情况来选取聚类中心,缺乏可靠的选取标准。针对上述问题,提出了一种基于加权K近邻的改进密度峰值聚类算法,将最近邻算法的思想引入密度峰值聚类算法,重新定义并计算了各数据点的局部密度,并通过权值斜率变化趋势来判别聚类中心临界点。通过在人工数据集上与UCI真实数据集上的实验,将该改进算法与原密度峰值聚类、K-means及DBSCAN算法进行了对比,证明了改进算法能够在密度不均匀数据集上有效完成聚类,能够发现任意形状簇,且在三个聚类性能指标上普遍高于另外三种算法。
关键词 数据挖掘; 加权K近邻; 密度峰值; 聚类
基金项目 国家自然科学基金资助项目
本文URL http://www.arocmag.com/article/01-2020-03-006.html
英文标题 Improved density peak clustering algorithm based on weighted K-nearest neighbor
作者英文名 Yang Zhen, Wang Hongjun
机构英文名 National University of Defense Technology,Hefei 230037,China
英文摘要 The density peak clustering algorithm was a new density-based clustering algorithm, the algorithm requires only one input parameter and does not require frequent iterative processes. However, the original algorithm only considers the global structure of the data, and the effect is not ideal when clustering data sets with uneven distribution. Moreover, the original algorithm only selects the cluster center according to the distribution of points on the decision graph, which is not reliable. Aiming at the above problems, this paper proposed an improved density peak clustering algorithm based on weighted K-nearest neighbor. It introduced the idea of nearest neighbor algorithm into the density peak clustering algorithm, refined and calculated the local density of each data point, and determined the critical point of the cluster center by the trend of the slope of the weight. The improved algorithm was compared with the original density peak clustering algorithm, K-means algorithm and DBSCAN algorithm by experiments on the artificial dataset and UCI real dataset. It was proved that the improved algorithm can deal with the density uneven dataset and find clusters of arbitrary shapes. On the three cluster performance indicators, the improved algorithm is generally higher than the other three algorithms.
英文关键词 data mining; weighted K-nearest neighbor; density peaks; clustering
参考文献 查看稿件参考文献
 
收稿日期 2018/8/31
修回日期 2018/10/26
页码 667-671
中图分类号 TP301.6
文献标志码 A