《计算机应用研究》|Application Research of Computers

基于网格密度和引力的不确定数据流聚类算法

Uncertain data streams clustering algorithm based on grid density and force

免费全文下载 (已被下载 次)  
获取PDF全文
作者 邢长征,温培
机构 辽宁工程技术大学 电子与信息工程学院,辽宁 葫芦岛 125105
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2015)01-0098-04
DOI 10.3969/j.issn.1001-3695.2015.01.023
摘要 为改进EMicro算法存在的不足提出了GDF-CUStreams算法。该算法采用网格特征向量存储数据的分布特征,通过更新网格特征向量合并成簇对不确定数据流聚类,对新数据点的到来采用增量聚类。通过网格密度和网格质心之间的距离判定网格是否是零星网格,利用网格引力对簇边界进行优化,检测和删除零星网格,使簇边缘更加平滑,提高聚类精度。其中网格密度和网格质心都采用增量更新。实验结果表明,与EMicro算法相比,GDF-CUStreams效率更高且效果良好。
关键词 不确定数据流;网格特征向量;网格密度;网格引力;零星网格
基金项目
本文URL http://www.arocmag.com/article/01-2015-01-023.html
英文标题 Uncertain data streams clustering algorithm based on grid density and force
作者英文名 XING Chang-zheng, WEN Pei
机构英文名 School of Electronic & Information Engineering, Liaoning Technical University, Huludao Liaoning 125105, China
英文摘要 To improve existing deficiencies of EMicro, this paper proposed GDF-CUStreams algorithms.It used a grid feature vector to store data distribution characteristics, updated the grid feature vector merging clusters on uncertain data stream clustering, and used incremental clustering for the arrival of the new data points.Through the grid density and the distance between the grid centroid to determine whether the grid on cluster edge is a sporadic grid, it used grid force to detect and remove sporadic grid to optimize the cluster boundary and improve the clustering precision.Employing an incremental updated for grid density and grid centroid.Experimental results show that, compared with EMicro algorithms, GDF-CUStreams has better clustering effectiveness and better processing efficiency.
英文关键词 uncertain data streams; grid feature vector; grid density; grid force; sporadic grid
参考文献 查看稿件参考文献
  [1] PENG Yu, SONG Jia, PENG Xi-yuan. Survey of fault management framework in wireless sensor networks[J] . Journal of Electronic Measurement and Instrument, 2009, 23(11):1-10.
[2] PENG Yu, LUO Qing-hua, PENG Xi-yuan. Analysis of uncertain data processing methods in networking test framework[J] . Chinese Journal of Scientific Instrument, 2010, 31(1):229-240.
[3] YE Li, QIN Zhi-guan, YANG Xi-mei, et al. Uncertain range query and pruning algorithm for bead model[J] . Journal of Electronic Measurement and Instrument, 2010, 24(8):722-729.
[4] AGGARWAL C C, YU P S. A framework for clustering uncertain data streams[C] //Proc of the 24th IEEE International Conference on Data Engineering. 2008:150-159.
[5] 金澈清, 钱卫宁, 周傲英. 流数据分析与管理综述[J] . 软件学报, 2004, 15(8):1172-1181.
[6] BABCOCK B, BABU S, DATAR M, et al. Models and issues in data stream systems[C] //Proc of the 21th ACM Symposium on Principle of Database Systems. New York:ACM Press, 2002:1-16.
[7] ZHOU Ao-ying, CAO Feng, QIAN Wei-ming, et al. Tracking clusters in evolving data streams over sliding windows[J] . Knowledge and Information Systems, 2008, 15(2):1-214.
[8] O’CALLAGHAN L, MISHRA N, MEYERSON A, et al. Streaming data algorithm for high-quality clustering[C] //Proc of IEEE International Conference on Data Engineering. 2002:685-696.
[9] AGGARWAL C C, HAN Jia-wei, WANG Jian-rong, et al. A framework for clustering evolving data streams[C] //Proc of the 29th International Conference on Very Large Data Bases. [S. l. ] :VLDB Endowment, 2003:81-92.
[10] ZHANG Chen, GAO Ming, ZHOU Ao-ying. Tracking high quality clusters over uncertain data streams[C] //Proc ofIEEE International Conference on Data Engineering. 2009:1641-1648.
[11] HUANG Guo-yan, LIANG Da-peng, REN Jia-dong. An algorithm for clustering uncertain data streams over sliding windows[C] //Proc of the 6th International Conference on Digital Content, Multimedia Technology and its Applications. 2010:173-177.
[12] DAI Dong-bo, ZHAO Gang, SUN Sheng-li. Effective clustering algorithm for probabilistic data stream[J] . Journal of Software, 2009, 20(5):1313-1328.
[13] ZHANG Chen, JIN Chen-qing, ZHOU Ao-ying. Clustering algorithm over uncertain data streams[J] . Journal of software, 2010, 21(9):2173-2182.
[14] YANG Yue, LIU Zhou, ZHANG Jian-pei, et al. Dynamic density-based clustering algorithm over uncertain data streams[C] //Proc of IEEE International Conference on Data Engineering. 2012:2664-2670.
[15] NEWMAN D J, HETTICH S, BLAKE C L, et al. UCI repository of machine learning databases[EB/OL] . http://archive. ics. uci. edu/ml/.
收稿日期 2013/12/1
修回日期 2014/1/23
页码 98-101
中图分类号 TP181;TP301.6
文献标志码 A