《计算机应用研究》|Application Research of Computers

面向大数据的K-means算法综述

Survey of K-means algorithm on big data

免费全文下载 (已被下载 次)  
获取PDF全文
作者 任远航
机构 电子科技大学 信息与软件工程学院,成都 610054
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2020)12-002-3528-06
DOI 10.19734/j.issn.1001-3695.2019.10.0581
摘要 聚类作为一种重要的数据挖掘方式,如何在海量数据下更快获得一个有理论保证的K-means的近似解是一个关键问题。首先,定义K-means问题并介绍相关背景;然后,从理论保证和加速两个方面分别介绍国内外先进研究成果;最后,总结现有成果并对未来面向大数据的K-means研究方向予以展望和预测。
关键词 聚类; K-means; 采样; 次线性时间算法; 理论保证
基金项目
本文URL http://www.arocmag.com/article/01-2020-12-002.html
英文标题 Survey of K-means algorithm on big data
作者英文名 Ren Yuanhang
机构英文名 School of Information & Software Engineering,University of Electronic Science & Technology of China,Chengdu 610054,China
英文摘要 Among all the clustering problems, the K-means problem is probably the most well-known one. How to obtain a theoretically guaranteed solution of K-means efficiently for the big data can be a key problem. This paper surveyed the progress of this problem. Firstly, this paper defined the K-means problem and introduced relevant backgrounds. Secondly, it introduced separately and described in details the techniques for theoretical guarantee and speed up. Finally, it summarized the main results and forecasted the future directions of K-means algorithms on big data.
英文关键词 clustering; K-means; sampling; sub-linear time algorithms; theoretical guarantee
参考文献 查看稿件参考文献
 
收稿日期 2019/10/12
修回日期 2019/12/8
页码 3528-3533
中图分类号 TP301.6
文献标志码 A