《计算机应用研究》|Application Research of Computers

支持差分隐私保护及离群点消除的并行K-means算法

Parallel K-means algorithm with differential privacy preservation and outlier pruning

免费全文下载 (已被下载 次)  
获取PDF全文
作者 樊一康,刘建伟
机构 北京航空航天大学 电子信息工程学院,北京 100191
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2019)06-037-1776-06
DOI 10.19734/j.issn.1001-3695.2017.12.0825
摘要 针对大数据环境下聚类分析的隐私保护问题,基于MapReduce计算框架,提出了一种并行化的支持差分隐私保护和离群点消除的K-means算法。算法并行地计算数据集中各点间的欧氏距离矩阵与最近邻超球半径以导出离群点的判定阈值,并在此基础上完成差分隐私保护下的初始聚类中心选取和并行聚类过程。理论分析证明整个算法满足<i>ε</i>-差分隐私保护,实验结果说明该算法在隐私保护的有效性、聚类结果的可用性以及执行效率等方面取得了很好的平衡,相比于同类算法有较优的表现。
关键词 K-均值聚类; 离群点消除; 差分隐私; MapReduce
基金项目 国家自然科学基金资助项目(61272501)
本文URL http://www.arocmag.com/article/01-2019-06-037.html
英文标题 Parallel K-means algorithm with differential privacy preservation and outlier pruning
作者英文名 Fan Yikang, Liu Jianwei
机构英文名 School of Electronic Information Engineering,Beihang University,Beijing 100191,China
英文摘要 Aiming at the problem of privacy protection of clustering analysis in big data environment, based on the MapReduce computing framework, this paper proposed a parallel K-means algorithm that supported differential privacy protection and outlier elimination. The algorithm parallelly calculated the Euclidean distance matrix and nearest neighbor hypersphere radius between points in data set to derive the decision threshold of outliers, and then completed the initial cluster center selection and parallel clustering process under differential privacy protection. The theoretical analysis proves that the proposed algorithm satisfies <i>ε</i>-differential privacy, and the experimental results show that, compared with other algorithms, this algorithm performs better and has a good balance between the validity of privacy protection, the availability of clustering result and the efficiency of implementation.
英文关键词 K-means clustering; outlier pruning(DP); differential privacy(DP); MapReduce
参考文献 查看稿件参考文献
 
收稿日期 2017/12/27
修回日期 2018/2/27
页码 1776-1781,1787
中图分类号 TP309.2
文献标志码 A