《计算机应用研究》|Application Research of Computers

基于约束优化传播的改进大规模数据半监督式谱聚类算法

Constrain optimal propagation-based improved semi-supervised spectral clustering algorithm for large-scale data

免费全文下载 (已被下载 次)  
获取PDF全文
作者 徐达宇,郁莹珺,冯海林,张旭尧
机构 1.浙江农林大学 信息工程学院,杭州 311300;2.信雅达系统工程股份有限公司,杭州 310053
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2018)05-1325-06
DOI 10.3969/j.issn.1001-3695.2018.05.010
摘要 针对传统谱聚类算法在聚类过程中所出现的高计算复杂度、噪声敏感,以及聚类簇形态偏斜等问题,结合当前大规模数据聚类的特点与需求,建立基于约束优化传播的改进大规模数据半监督式谱聚类模型。该模型利用先验成对点约束信息构建微型相似性矩阵,在此基础上采用Gabow算法提取该微型相似性矩阵所对应连通图的各强连通分支,继而提出面向各强连通分支的新型约束优化传播算法以获取整个数据集的点对相似度,最后通过奇异值分解并运用加速K-means算法获得大规模数据的聚类结果。在多个标准测试数据集上的实验表明,相比于该领域其他前期研究成果,该聚类模型具有更高的聚类准确率和更低的计算复杂度,更适合大规模数据的聚类应用。
关键词 谱聚类;大规模数据;点对约束;相似性传播;奇异值分解
基金项目 国家自然科学基金资助项目(61272313)
浙江省自然科学基金项目(LQ17G010003)
浙江省重大科技专项项目(2015C03008)
本文URL http://www.arocmag.com/article/01-2018-05-010.html
英文标题 Constrain optimal propagation-based improved semi-supervised spectral clustering algorithm for large-scale data
作者英文名 Xu Dayu, Yu Yingjun, Feng Hailin, Zhang Xuyao
机构英文名 1.SchoolofInformationEngineering,ZhejiangA&FUniversity,Hangzhou311300,China;2.SunyardSystemEngineeringCo.,LTD.,Hangzhou310053,China
英文摘要 Focusing on the problem of high computational complexity, noise sensitivity and the shape deviation of cluster in the clustering process of traditional spectral clustering, and combining the characteristics with the need of current large-scale data clustering, this article established the semi-supervised of large-scale data model based on constrained optimal propagation. First, it constructed the micro similarity matrix by using prior dotted pair constraint information. On this basis, it used the Gabow algorithm to extract the micro similarity matrix corresponding connected graph of each strongly connected component. Then, it proposed a new constrained optimization propagation algorithm for each strongly connected component to obtained the similarity of the point of the whole data set. Finally, it could obtain the clustering results of large scale data by using the singular value decomposition and the accelerated K-means algorithm. Experiments on multiple standard testing data sets show that compared with other previous research results in this field, the proposed clustering model has higher clustering accuracy and lower computation complexity and is more suitable for large-scale data clustering applications.
英文关键词 spectral clustering; large-scale data; pairwise constraint; affinity propagation; singular value decomposition
参考文献 查看稿件参考文献
  [1] 管涛, 杨婷. 谱聚类广义模型与典型算法分析[J] . 模式识别与人工智能, 2014, 27(11):1015-1025.
[2] Du Hui, Wang Yuping, Dong Xiaopan, et al. Texture image segmentation using spectral clustering[C] //Proc of HCI International Pos-ters’ Extended Abstracts. [S. l. ] :Springer International Publishing, 2015:671-676.
[3] Ahn I, Kim C. Face and hair region labeling using semi-supervised spectral clustering-based multiple segmentations[J] . IEEE Trans on Multimedia, 2016, 18(7):1414-1421.
[4] Xiao Xiang, Liu Le, Hu Haifeng. Discriminative feature fusion with spectral method for human action recognition[C] //Proc of Chinese Conference on Biometric Recognition. [S. l. ] :Springer International Publishing, 2015:641-648.
[5] Bergamasco L C C, Oliveira R A P, Wechsler H, et al. Content-based image retrieval of 3D cardiac models to aid the diagnosis of congestive heart failure by using spectral clustering[C] //Proc of the 28th International Symposium on Computer-Based Medical Systems. Pisca-taway, NJ:IEEE Press, 2015:183-186.
[6] Paccanaro A, Casbon J A, Saqi M A S. Spectral clustering of protein sequences[J] . Nucleic Acids Research, 2006, 34(5):1571-1580.
[7] Flake G W, Tarjan R E, Tsioutsiouliklis K. Graph clustering and minimum cut trees[J] . Internet Mathematics, 2003, 1(4):385-408.
[8] Shi J, Malik J. Normalized cuts and image segmentation[J] . IEEE Trans on Pattern Analysis and Machine Intelligence, 2000, 22(8):888-905.
[9] Chan P K, Schlag M D F, Zien J Y. Spectral K-way ratio-cut partitioning and clustering[J] . IEEE Trans on Computer Aided Design of Integrated Circuits and Systems, 1994, 13(9):1088-1096.
[10] Bolla M Z. Multiway cuts and spectra[M] //Spectral Clustering and Biclustering:Learning Large Graphs and Contingency Tables. [S. l. ] :Wiley, 2013:44-95.
[11] Ding Shifei, Jia Hongjie, Zhang Liwei, et al. Research of semi-supervised spectral clustering algorithm based on pairwise constraints[J] . Neural Computing and Applications, 2014, 24(1):211- 219.
[12] Lu Canyi, Yan Shuicheng, Lin Zhouchen. Convex sparse spectral clustering:single-view to multi-view[J] . IEEE Trans on Image Processing, 2016, 25(6):2833-2843.
[13] 赵凤焦, 李成, 刘汉强, 等. 半监督谱聚类特征向量选择算法[J] . 模式识别与人工智能, 2011, 24(1):48-56.
[14] Sourati J, Erdogmus D, Dy J G, et al. Accelerated learning-based interactive image segmentation using pairwise constraints[J] . IEEE Trans on Image Processing, 2014, 23(7):3057-3070.
[15] Cao Jiangzhong, Chen Pei, Dai Qingyun, et al. Local information-based fast approximate spectral clustering[J] . Pattern Recognition Letters, 2014, 38(1):63-69.
[16] Kang U, Meeder B, Papalexakis E E, et al. HEigen:spectral analysis for billion-scale graphs[J] . IEEE Trans on Knowledge and Data Engineering, 2014, 26(2):350-362.
[17] Rafailidis D, Constantinou E, Manolopou-los Y. Scalable spectral clustering with weighted PageRank[C] //Proc of International Confe-rence on Model and Data Engineering. [S. l. ] :Springer International Publishing, 2014:289-300.
[18] Chen Xinlei, Cai Deng. Large scale spectral clustering with landmark-based representation[C] //Proc of the 25th AAAI Conference on Artificial Intelligence. Palo Alto, CA:AAAI Press, 2011:313-318.
[19] Eubank R L. Applied nonparametric regression[J] . Technometrics, 1991, 35(2):225- 226.
[20] Gabow H N. Path-based depth-first search for strong and biconnected components[J] . Information Processing Letters, 2000, 74(3):107-114.
[21] Drake J, Hamerly G. Accelerated K-means with adaptive distance bounds[C] //Proc of the 5th NIPS Workshop on Optimization for Machine Learning. 2012.
[22] MATLAB codes and datasets for feature learning[EB/OL] . http://www. cad. zju. edu. cn/home/dengcai/Data/data. html.
[23] Datasets for “the elements of statistical learning”[EB/OL] . http://www. stat. stanford. edu/~tibs/ElemStatLearn/data. html.
[24] Ng A Y, Jordan M I, Weiss Y. On spectral clustering:analysis and an algorithm[C] //Proc of the 14th International Conference on Neural Information Processing Systems Natural Synthetic. Cambridge, MA:MIT Press, 2002:849-856.
[25] Chen Weifu, Feng Guocan. Spectral clustering:a semi-supervised approach[J] . Neurocomputing, 2012, 77(1):229-242.
[26] Fowlkes C, Belongie S, Fan C, et al. Spectral grouping using the Nystrm method[J] . IEEE Trans on Pattern Analysis and Machine Intelligence, 2004, 26(2):214-225.
[27] Dhillon I S, Guan Yuqiang, Kulis B. Weighted graph cuts without eigenvectors a multilevel approach[J] . IEEE Trans on Pattern Analysis and Machine Intelligence, 2007, 29(11):1944-1957.
收稿日期 2016/12/29
修回日期 2017/3/6
页码 1325-1330
中图分类号 TP391;TP301.6
文献标志码 A