英文标题 | Automatic generation of data cleaning rule chain based on poset |
作者英文名 | He Jun, Zhang Caiqing, Li Xiaozhen, Zhang Dehai |
机构英文名 | 1.College of Information Engineering,Kunming University,Kunming 650214,China;2.a.College of Foreign Languages,b.College of Software,Yunnan University,Kunming 650206,China |
英文摘要 | In order to solve the problem of frequent logical conflicts and high error rate between rules in data cleaning, this paper proposed an automatic rule chain generation method based on partial order set. It classified and processed the rules from top to bottom by the hierarchical data cleaning framework. It automatically generated the rules chain of each level by using partial order set and Hasse diagram, and designed the corresponding generation algorithm and automatic cleaning algorithm. Taking the data of poverty alleviation as an example, the results show that the proposed method can improve the efficiency of data cleaning, reduce the error rate of cleaning results, and test the scientificity and effectiveness of the method. |
英文关键词 | poset; data cleaning; rule chain; Haase diagram; poverty alleviation area |