《计算机应用研究》|Application Research of Computers

数据流中结构二叉树挖掘算法研究

Study on algorithm for mining structural binary tree in data stream

免费全文下载 (已被下载 次)  
获取PDF全文
作者 唐向红,元宁,易向华,陆见光
机构 1.贵州大学 a.现代制造技术教育部重点实验室;b.机械工程学院,贵阳 550025;2.贵州省公共大数据重点实验室,贵阳 550025
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2017)10-2924-05
DOI 10.3969/j.issn.1001-3695.2017.10.010
摘要 针对传统数据流挖掘算法不能挖掘出频繁项之间的关系而且挖掘时间和空间复杂度高、准确度不高的问题,提出了一种数据流中结构二叉树挖掘算法(AMST)。该算法利用了二叉树结构的优势,将所处理事务数据库中的数据流转换成结构化二叉树,然后利用数据流矩阵对结构二叉树进行挖掘。整个过程只对事务数据库进行一次扫描,大大提高了挖掘的效率;此外,算法还找出了具有层次关系的频繁子树。实验结果表明,AMST算法性能稳定,在时间复杂度和空间复杂度方面有很大的优越性,能够快速准确地对数据流进行挖掘。
关键词 数据流;频繁项集;结构二叉树;数据流矩阵
基金项目 贵州省重大科技专项资助项目(黔科合重大专项字(2013)6019,黔科合重大专项字〔2012〕6018)
贵州省基础研究重大项目(黔科合JZ字(2014)2001)
本文URL http://www.arocmag.com/article/01-2017-10-010.html
英文标题 Study on algorithm for mining structural binary tree in data stream
作者英文名 Tang Xianghong, Yuan Ning, Yi Xianghua, Lu Jianguang
机构英文名 1.a.KeyLaboratoryofAdvancedManufacturingTechnologyofMinistryofEducation,b.SchoolofMechanicalEngineering,GuizhouUniversity,Guiyang550025,China;2.GuizhouProvincialKeyLaboratoryofPublicBigData,Guiyang550025,China
英文摘要 Aiming at the problem that the traditional algorithms of data stream cannot mine the relationship between the frequent items, and the mining time and space complexity is high, while the accuracy is not high, this paper proposed an algorithm for mining structural binary tree in data stream (AMST). The algorithm used the advantage of binary tree, transformed the data stream into a structured binary tree, and then mined the binary tree with the data stream matrix. The algorithm greatly improved the efficiency of mining because it only scaned the database once in the whole process. In addition, the algorithm also found out frequent subtrees with hierarchical relations. Experiment results show that the algorithm has stable performance and better advantage in time and space complexity, and can mine the data stream rapidly and accurately.
英文关键词 data stream; frequent items; structural binary tree; data stream matrix
参考文献 查看稿件参考文献
  [1] Leung C K, Joseph K W. Sports data mining:predicting results for the college football games[J] . Procedia Computer Science, 2014, 35:710-719. 〓〓[2] Leung C K, MacKinnon R K, Wang Yang. A machine learning approach for stock price prediction[C] //Proc of the 18th International Database Engineering and Application Symposium. New York:ACM Press, 2014:274-277.
[3] Tanbeer S K, Leung C K, Cameron J J. Interactive mining of strong friends from social networks and its applications in e-commerce[J] . Journal of Organizational Computing and Electronic Commerce, 2014, 24(2-3):157-173.
[4] 王建华. 制造物联海量数据流处理方法研究[D] . 广州:广东工业大学, 2015.
[5] 娄建楼, 邹伟, 王玲, 等. 社交网络大数据下贪婪式实时网站推荐算法[J] . 计算机应用研究, 2015, 32(5):1361-1364.
[6] 李楠. 基于关联数据的知识发现研究[D] . 北京:中国农业科学院, 2012.
[7] 王亚琴. 道路交通流数据挖掘研究[D] . 上海:复旦大学, 2007.
[8] 陈鹏. 数据流关联规则挖掘研究及其应用[D] . 杭州:浙江大学, 2011.
[9] Leung C K, Khan Q I. DSTree:a tree structure for the mining of frequent sets from data streams[C] //Proc of the 6th IEEE International Conference on Data Mining. 2006:928-932.
[10] Gurmeet S, Rajeev M. Approximate frequent counts over data streams[C] //Proc of the 28th VLDB Conference. 2002:346-357.
[11] Leung C K, Jiang F, Hayduk Y. A landmark-model based system for mining frequent patterns from uncertain data streams[C] //Proc of the 15th Symposium on International Database Engineering and Applications. New York:ACM Press, 2011:249-250.
[12] 李爱国, 厍向阳. 数据挖掘原理、算法及应用[M] . 西安:西安电子科技大学出版社, 2012.
[13] 寇香霞, 任永功, 宋奎勇. 一种基于滑动窗口的数据流频繁项集挖掘算法[J] . 计算机应用与软件, 2013, 30(1):143-146.
收稿日期 2016/7/23
修回日期 2016/9/12
页码 2924-2928
中图分类号 TP301.6
文献标志码 A