《计算机应用研究》|Application Research of Computers

基于差分隐私的不确定数据频繁项集挖掘算法

Frequent itemsets mining for uncertain data based on differential privacy

免费全文下载 (已被下载 次)  
获取PDF全文
作者 丁哲,秦臻,秦志光
机构 电子科技大学 信息与软件工程学院,成都 610054
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2018)07-1942-05
DOI 10.3969/j.issn.1001-3695.2018.07.004
摘要 基于不确定数据的频繁项集挖掘算法已经得到了广泛的研究。对于记录用户敏感信息的不确定数据,攻击者可以利用自己掌握的背景信息,通过分析基于不确定数据的频繁项集从而获得用户的敏感信息。为了从不确定的数据集中挖掘出基于期望支持度的前K个最频繁的频繁项集,并且保证挖掘结果满足差分隐私,提出了FIMUDDP(frequent itemsets mining for uncertain data based on differential privacy)算法。FIMUDDP算法利用差分隐私的指数机制和拉普拉斯机制确保从不确定数据中挖掘出的基于期望支持度的前K个最频繁的频繁项集和这些频繁项集的期望支持度满足差分隐私。通过对FIMUDDP进行理论分析和实验评估,验证了FIMUDDP算法的有效性。
关键词 差分隐私;不确定数据的频繁项集;截断期望支持度
基金项目 国家自然科学基金资助项目(61672135,61370026)
国家“863”计划资助项目(2015AA016007)
四川省科技计划资助项目(2015GZ0095,2016JZ0020)
国家自然科学基金委员会—广东省人民政府自然科学联合基金重点项目(U1401257)
本文URL http://www.arocmag.com/article/01-2018-07-004.html
英文标题 Frequent itemsets mining for uncertain data based on differential privacy
作者英文名 Ding Zhe, Qin Zhen, Qin Zhiguang
机构英文名 ShoolofInformation&SoftwareEngineering,UniversityofElectronicScience&TechnologyofChina,Chengdu610054,China
英文摘要 Frequent itemsets mining for uncertain data has been studied extensively. For uncertain data which recorded private information of users, an attacker made use of background knowledge to obtain private information of users by analyzing frequent itemsets mined from uncertain data. In this regard, this paper proposed a new algorithm, denoted as FIMUDDP (frequent itemsets mining for uncertain data based on differential privacy), to mine the top K most frequent itemsets based on expected support from uncertain data and satisfy differential privacy. FIMUDDP algorithm applied exponential mechanism and Laplace mechanism in differential privacy to ensure differential privacy for the top K most frequent itemsets based on expected supports of these frequent itemsets respectively. Finally, through analyzing FIMUDDP from theory and experiment evaluation, the results demonstrate the effectiveness of FIMUDDP.
英文关键词 differential privacy; frequent itemsets for uncertain data; truncated expected support
参考文献 查看稿件参考文献
  [1] Friedman A, Schuster A. Data mining with differential privacy[C] //Proc of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2010:493-502.
[2] Tong Yongxin, Chen Lei, Cheng Yurong, et al. Mining frequent itemsets over uncertain databases[J] . Proceedings of the VLDB Endowment, 2012, 5(11):1650-1661.
[3] Bhaskar R, Laxman S, Smith A, et al. Discovering frequent patterns in sensitive data[C] //Proc of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2010:503-512.
[4] Chui C K, Kao Ben, Hung E. Mining frequent itemsets from uncertain data[C] //Proc of the 11th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Berlin:Springer-Verlag, 2007:47-58.
[5] Han Jiawei, Kamber M, Pei Jian. Data mining:concepts and techniques[M] . 3rd ed. San Francisco:Morgan Kaufmann Publishers Inc, 2011.
[6] Leung C K S, Mateo M A F, Brajczuk D A. A tree-based approach for frequent pattern mining from uncertain data[C] //Proc of the 12th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Berlin:Springer-Verlag, 2008:653-661.
[7] Bernecker T, Kriegel H P, Renz M, et al. Probabilistic frequent itemset mining in uncertain databases[C] //Proc of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2009:119-128.
[8] Calders T, Garboni C, Goethals B. Approximation of frequentness probability of itemsets in uncertain data[C] // Proc of IEEE International Conference on Data Mining. Washington DC:IEEE Computer Society, 2010:749-754.
[9] Peterson E A, Tang Peiyi. Fast approximation of probabilistic frequent closed itemsets[C] //Proc of the 50th Annual Southeast Regional Conference. New York:ACM Press, 2012:214-219.
[10] Tang Peiyi, Peterson E A. Mining probabilistic frequent closed itemsets in uncertain databases[C] //Proc of the 49th Annual Southeast Regional Conference. New York:ACM Press, 2011:86-91.
[11] Tong Yongxin, Chen Lei, Ding Bolin. Discovering threshold-based frequent closed itemsets over probabilistic data[C] //Proc of the 28th International Conference on Data Engineering. Washington DC:IEEE Computer Society, 2012:270-281.
[12] Lin C W J, Gan Wensheng, Fournier-Viger P, et al. Weighted frequent itemset mining over uncertain databases[J] . Applied Intelligence, 2016, 44(1):232-250.
[13] Ahmed A U, Ahmed C F, Samiullah M, et al. Mining interesting patterns from uncertain databases[J] . Information Sciences, 2016, 354(8):60-85.
[14] Lin C W J, Gan Wensheng, Fournier-Viger P, et al. Efficient algorithms for mining high-utility itemsets in uncertain databases[J] . Knowledge-Based Systems, 2016, 96(3):171-187.
[15] Dwork C, McSherry F, Nissim K, et al. Calibrating noise to sensitivity in private data analysis[C] //Proc of the 3rd Conference on Theory of Cryptography. Berlin:Springer-Verlag, 2006:265-284.
[16] McSherry F, Talwar K. Mechanism design via differential privacy[C] //Proc of the 48th Annual IEEE Symposium on Foundations of Computer Science. Washington DC:IEEE Computer Society, 2007:94-103.
[17] Li Ninghui, Qardaji W, Su Dong, et al. PrivBasis:frequent itemset mining with differential privacy[J] . Proceedings of the VLDB Endowment, 2012, 5(11):1340-1351.
[18] Su Sen, Xu Shengzhi, Cheng Xiang, et al. Differentially private frequent itemset mining via transaction splitting[J] . IEEE Trans on Knowledge and Data Engineering, 2015, 27(7):1875-1891.
[19] Maruseac M, Ghinita G. Differentially-private mining of moderately-frequent high-confidence association rules[C] //Proc of the 5th ACM Conference on Data and Application Security and Privacy. New York:ACM Press, 2015:13-24.
[20] Xu Shengzhi, Cheng Xiang, Su Sen, et al. Differentially private frequent sequence mining[J] . IEEE Trans on Knowledge and Data Engineering, 2016, 28(11):2910-2926.
收稿日期 2017/3/23
修回日期 2017/4/30
页码 1942-1946
中图分类号 TP301.6
文献标志码 A