《计算机应用研究》|Application Research of Computers

基于洗牌算法的大数据抽样有效性分析

Analysis of sampling effectiveness of big data based on shuffling algorithm

免费全文下载 (已被下载 次)  
获取PDF全文
作者 刘涵阅,张春生
机构 内蒙古民族大学 计算机科学与技术学院,内蒙古 通辽 028043
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2021)10-029-3049-06
DOI 10.19734/j.issn.1001-3695.2021.02.0055
摘要 基于折叠技术的洗牌算法具有较好的数据置乱效果,可以满足大数据抽样的前提条件。为证明置乱后的数据集在经过抽样后内部规则不被破坏,通过数据挖掘的方法对抽样前后的数据进行关联规则分析。对比所得关联规则的支持度、置信度,以及事务出现的频率,发现经过折叠洗牌算法置乱后的数据在抽样前后所得到的关联规则变化相对稳定,并通过与现有算法的时间效率以及抽样总体误差作对比,进一步在理论上得出大数据抽样具有有效性,即可以通过抽样样本来推测数据整体情况。
关键词 大数据; 抽样技术; 折叠洗牌算法; 关联规则; 抽样有效性
基金项目 国家自然科学基金资助项目(81460656)
内蒙古自然科学基金资助项目(2018MS06016)
本文URL http://www.arocmag.com/article/01-2021-10-029.html
英文标题 Analysis of sampling effectiveness of big data based on shuffling algorithm
作者英文名 Liu Hanyue, Zhang Chunsheng
机构英文名 College of Computer Science & Technology,Inner Mongolia University for Nationalities,Tongliao Inner Mongolia 028043,China
英文摘要 The shuffling algorithm based on folding technique has ideal data scrambling effect and can meet the prerequisite of big data sampling. In order to prove that the internal rules of the scrambled data set are not destroyed after sampling, this paper analyzed the association rules of the data before and after sampling by data mining, and compared the support and confidence of the association rules as well as the frequency of the transactions, and found that the association rules of the data after the collapsed shuffle algorithm were stable before and after sampling. And by comparing with the time efficiency of the existing algorithm and the overall sampling error, it is further concluded that big data sampling is effective. Which means that the overall situation of the data can be inferred from the sampled sample.
英文关键词 big data; sampling technology; shuffling algorithm; association rules; sampling effectiveness
参考文献 查看稿件参考文献
 
收稿日期 2021/2/19
修回日期 2021/4/7
页码 3049-3054
中图分类号 TP312
文献标志码 A