英文标题 | Research on schema-agnostic entity resolution based on undirected block weighted graph |
作者英文名 | Yang Ning, Lu Jing, Shao Qing, Liu Cong |
机构英文名 | School of Optical-Electrical & Computer Engineering,University of Shanghai for Science & Technology,Shanghai 200093,China |
英文摘要 | The blocking approach ignored the weight of block keys and ambiguity between blocking keys, leading to low accuracy. This paper proposed a schema-agnostic entity resolution method based on undirected weighted graph. The method extracted attributes from data sources, combined attribute information entropy and TF-IDF to get clustering attributes and established an unified block scheme. Through the relationship between clustering attribute weight and block key, it gave each group of block key a certain weight. The weight and the cooccurrence frequency of the edge were multiplied and weighted to form the undirected block weighted graph. Finally, it used the pruning scheme to prune the edges. The problem of multi-attribute and block key ambiguity in data was solved and increased accuracy. Experiments on seven real data sets show that the method is effective and scalable. |
英文关键词 | entity resolution; schema-agnostic entity; attribute weight; clustering attributes; undirected block weighted graph |