《计算机应用研究》|Application Research of Computers

基于偏序集的数据清洗规则链自动生成方法

Automatic generation of data cleaning rule chain based on poset

免费全文下载 (已被下载 次)  
获取PDF全文
作者 何俊,张彩庆,李小珍,张德海
机构 1.昆明学院 信息工程学院,昆明 650214;2.云南大学 a.外国语学院;b.软件学院,昆明 650206
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2021)01-016-0083-05
DOI 10.19734/j.issn.1001-3695.2019.12.0617
摘要 针对数据清洗中规则间逻辑冲突频发和出错率高的问题,提出一种基于偏序集的规则链自动生成方法。通过分层组合的数据清洗框架自顶向下对规则进行分类处理,采用偏序集和哈斯图自动生成每个层级的逻辑正确和一致的规则链,并设计出对应的生成算法和自动清洗算法。以扶贫领域数据为例进行实验,结果表明该方法使数据清洗效率有一定提升,清洗结果出错率明显降低,检验了方法的科学性和有效性。
关键词 偏序集; 数据清洗; 规则链; 哈斯图; 扶贫领域
基金项目 国家自然科学基金资助项目(61263043,61864004)
云南省地方本科高校基础研究联合专项基金资助项目(2017FH001-05)
本文URL http://www.arocmag.com/article/01-2021-01-016.html
英文标题 Automatic generation of data cleaning rule chain based on poset
作者英文名 He Jun, Zhang Caiqing, Li Xiaozhen, Zhang Dehai
机构英文名 1.College of Information Engineering,Kunming University,Kunming 650214,China;2.a.College of Foreign Languages,b.College of Software,Yunnan University,Kunming 650206,China
英文摘要 In order to solve the problem of frequent logical conflicts and high error rate between rules in data cleaning, this paper proposed an automatic rule chain generation method based on partial order set. It classified and processed the rules from top to bottom by the hierarchical data cleaning framework. It automatically generated the rules chain of each level by using partial order set and Hasse diagram, and designed the corresponding generation algorithm and automatic cleaning algorithm. Taking the data of poverty alleviation as an example, the results show that the proposed method can improve the efficiency of data cleaning, reduce the error rate of cleaning results, and test the scientificity and effectiveness of the method.
英文关键词 poset; data cleaning; rule chain; Haase diagram; poverty alleviation area
参考文献 查看稿件参考文献
 
收稿日期 2019/12/9
修回日期 2020/1/21
页码 83-87
中图分类号 TP391
文献标志码 A