《计算机应用研究》|Application Research of Computers

面向主动学习的模糊核聚类采样算法

Sampling algorithm using kernel-based fuzzy clustering for active learning

免费全文下载 (已被下载 次)  
获取PDF全文
作者 王勇臻,陈燕,张金松
机构 大连海事大学 交通运输管理学院,辽宁 大连 116026
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2017)12-3564-05
DOI 10.3969/j.issn.1001-3695.2017.12.010
摘要 针对主动学习中构造初始分类器难以选取代表性样本的问题,提出一种模糊核聚类采样算法。该算法首先通过聚类分析技术将样本集划分,然后分别在类簇中心和类簇边界区域选取样本进行标注,最后依此构造初始分类器。在该算法中,通过高斯核函数将原始样本空间中的点非线性变换到高维特征空间,以达到线性可聚的目的,并引入了一种基于局部密度的初始聚类中心选择方法,从而改善聚类效果。为了提高采样质量,结合划分后各类簇的样本个数设计了一种采样比例分配策略;同时,在采样结束阶段设计了一种后补采样策略,以确保采样个数达标。实验结果分析表明,所提算法可以有效地减少构造初始分类器所需的人工标注负担,并取得了较高的分类正确率。
关键词 高斯核函数;聚类分析;采样;主动学习;分类
基金项目 国家自然科学基金资助项目(71271034)
辽宁省自然科学基金资助项目(2014025015)
青年骨干教师基金资助项目(3132016045)
本文URL http://www.arocmag.com/article/01-2017-12-010.html
英文标题 Sampling algorithm using kernel-based fuzzy clustering for active learning
作者英文名 Wang Yongzhen, Chen Yan, Zhang Jinsong
机构英文名 CollegeofTransportationManagement,DalianMaritimeUniversity,DalianLiaoning116026,China
英文摘要 Since it is difficult to select representative samples for active learning when constructing the initial classifier, this paper proposed a sampling algorithm using kernel-based fuzzy clustering. This algorithm began with dividing the sample set via clustering analysis technology, then it extracted samples from regions near the center and the boundary of clusters respectively and labeled them. And in the final phase it constructed the initial classifier using these labeled samples. In this algorithm, it transformed the point in the original sample space into a high dimensional feature space by Gaussian kernel function with the aim of linear clustering, and it introduced an initial cluster center selection method based on local density to improve its cluster performance. In order to ameliorate its sampling quality, this paper designed a sampling proportion allocation strategy utilizing the number of samples of divided each cluster. At the end of sampling, it used a fallback sampling strategy to ensure that the number of samples was up to the standard. The experimental results have demonstrated that the proposed algorithm can effectively reduce the cost of labeling samples when constructing the initial classifier, and get a higher classification accuracy.
英文关键词 Gaussian kernel function; clustering analysis; sampling; active learning; classification
参考文献 查看稿件参考文献
  [1] Han Jiawei, Pei Jian, Kamber M. Data mining:concepts and techniques[M] . [S. l. ] :Elsevier, 2011.
[2] 钱晓东. 数据挖掘中分类方法综述[J] . 图书情报工作, 2007, 51(3):68-72. [3] 吴伟宁, 刘扬, 郭茂祖, 等. 基于采样策略的主动学习算法研究进展[J] . 计算机研究与发展, 2012, 49(6):1162-1173.
[4] Fu Yifan, Zhu Xingquan, Li Bin. A survey on instance selection for active learning[J] . Knowledge and Information Systems, 2013, 35(2):249-283.
[5] Lughofer E. Hybrid active learning for reducing the annotation effort of operators in classification systems[J] . Pattern Recognition, 2012, 45(2):884-896.
[6] 王骏, 王士同, 邓赵红. 聚类分析研究中的若干问题[J] . 控制与决策, 2012, 27(3):321-328.
[7] 朱林, 王士同, 邓赵红. 改进模糊划分的FCM聚类算法的一般化研究[J] . 计算机研究与发展, 2009, 46(5):814-822.
[8] 汪洪桥, 孙富春, 蔡艳宁, 等. 多核学习方法[J] . 自动化学报, 2010, 36(8):1037-1050.
[9] 章森, 朱美玲, 侯光奎. 改进的模糊核聚类算法[J] . 北京工业大学学报, 2012, 38(9):1408-1411.
[10] 王勇臻, 陈燕, 张金松. 一种改进的求解聚类问题的差分进化算法[J] . 计算机应用研究, 2016, 33(9):2630-2633.
[11] Liao Honghong, Xiang Jinhai, Sun Weiping, et al. Adaptive initiali-zation method based on spatial local information for K-means algorithm[J] . Mathematical Problems in Engineering, 2014:article ID 761468.
[12] Guo Husheng, Wang Wenjian. An active learning-based SVM multi-class classification model[J] . Pattern Recognition, 2015, 48(5):1577-1597.
[13] Chang C, Lin C. LIBSVM:a library for support vector machines[J] . ACM Trans on Intelligent Systems and Technology, 2011, 2(3):27.
[14] 公茂果, 王爽, 马萌, 等. 复杂分布数据的二阶段聚类算法[J] . 软件学报, 2011, 22(11):2760-2772.
收稿日期 2016/10/9
修回日期 2016/11/22
页码 3564-3568
中图分类号 TP301.6
文献标志码 A