《计算机应用研究》|Application Research of Computers

大数据中基于熵加权的稀疏分数特征选择聚类算法

Clustering algorithm based on entropy weighted and sparse fractional feature selection in big data

免费全文下载 (已被下载 次)  
获取PDF全文
作者 魏霖静,宁璐璐,郭斌,侯振兴
机构 1.甘肃农业大学 信息科学技术学院,兰州 730070;2.南洋理工大学 生物科学学院,新加坡 639798;3.河海大学 计算机与信息学院,南京 210094;4.南京大学 信息管理学院,南京 210093
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2018)08-2293-02
DOI 10.3969/j.issn.1001-3695.2018.08.013
摘要 为了提高大数据统计及分析的效率,有必要对数据集合进行聚类,以减少数据集合维度,并去掉相似数据冗余。采用熵加权和稀疏分数特征选择相结合,一方面对异构数据进行局部结构划分,降低数据维度,对局部结构的特征重要性标记并排序,提高聚类精度,另一方面,提高聚类稳定性。实验证明,该方法对不同种类的大数据聚类具有较强的适用性。
关键词 数据聚类;熵加权;稀疏分数;特征选择;数据维度;大数据
基金项目 国家自然科学基金资助项目(61063028,31560378)
江苏省自然科学基金青年基金资助项目(BK20150784)
中国博士后面上资助项目(2015M581800)
甘肃省科技支撑计划项目(1604WKCA011)
陇原青年创新创业人才项目(2016-47)
2016年度甘肃省高校重大软科学(战略)研究项目计划资助项目(2016F-10)
本文URL http://www.arocmag.com/article/01-2018-08-013.html
英文标题 Clustering algorithm based on entropy weighted and sparse fractional feature selection in big data
作者英文名 Wei Linjing, Ning Lulu, Guo Bin, Hou Zhenxing
机构英文名 1.SchoolofInformationScience&Technology,GansuAgricultureUniversity,Lanzhou730070,China;2.SchoolofBiologicalSciences,NanyangTechnologicalUniversity,SingaporeCity639798,Singapore;3.SchoolofComputerScience&Engineering,HohaiUniversity,Nanjing210094,China;4.SchoolofInformationScience&Engineering,NanjingUniversity,Nanjing210093,China
英文摘要 In order to improve the efficiency of data statistics and analysis, it is necessary to cluster data sets, for reduces the data sets collection dimension and removes similar data redundancy. This paper used entropy weighted and sparse fractional feature selection. On the one hand, it divided the local structure of heterogeneous data, reduced the data dimension, marked and sorted the feature importance of local structure, and improved the clustering accuracy. Experimental results show that the method has strong applicability to different kinds of large data clustering.
英文关键词 data clustering; entropy weighted; sparse fraction; feature selection; data dimension; big data
参考文献 查看稿件参考文献
  [1] 李晓瑜, 俞丽颖, 雷航, 等. 一种K-means改进算法的并行化实现与应用[J] . 电子科技大学学报, 2017, 46(1):61-68.
[2] 邓强, 杨燕, 王浩. 一种改进的多视图聚类集成算法[J] . 计算机科学, 2017, 44(1):65-70.
[3] Serdah A M, Ashour W M. Clustering large-scale data based on mo-dified affinity propagation algorithm[J] . Journal of Artificial Intelligence & Soft Computing Research, 2016, 6(1):23-33.
[4] Li Yangyang, Yang Guoli, He Haiyang, et al. A study of large-scale data clustering based on fuzzy clustering[J] . Soft Computing, 2016, 20(8):3231-3242.
[5] Si Fuming, Bu Tianran. Design of a large data clustering algorithm based on Hadoop cloud computing platform[J] . Journal of Chu-xiong Normal University, 2016, 31(3):49-55.
[6] Zhang Yanfeng, Chen Shimin, Yu Ge. Efficient distributed density peaks for clustering large data sets in MapReduce[J] . IEEE Trans on Knowledge and Data Engineering, 2016, 28(12):3218-3230.
[7] Delgado A, Romero I. Environmental conflict analysis using an integrated grey clustering and entropy-weight method[J] . Environmental Modelling & Software, 2016, 77(C):108-121.
[8] Zhang Lijun, Zhao Fangfang. Application for technological achievements evaluations model based on entropy weight and matter-element analysis[J] . Science & Technology Management Research, 2016(6):70-73.
[9] 邱保志, 贺艳芳, 申向东. 熵加权多视角核K-means算法[J] . 计算机应用, 2016, 36(6):1619-1623.
[10] 高翠芳, 黄珊维, 沈莞蔷, 等. 基于信息熵加权的协同聚类改进算法[J] . 计算机应用研究, 2015, 32(4):1016-1018.
[11] 蒋亦樟, 邓赵红, 王骏, 等. 熵加权多视角协同划分模糊聚类算法[J] . 软件学报, 2014, 25(10):2293-2311.
[12] 吴杰祺, 李晓宇, 袁晓彤, 等. 利用坐标下降实现并行稀疏子空间聚类[J] . 计算机应用, 2016, 36(2):372-376.
[13] 岳温川, 王卫卫, 李小平. 基于加权稀疏子空间聚类多特征融合图像分割[J] . 系统工程与电子技术, 2016, 38(9):2184-2191.
收稿日期 2017/4/10
修回日期 2017/6/2
页码 2293-2294,2303
中图分类号 TP274
文献标志码 A