《计算机应用研究》|Application Research of Computers

基于无向分块加权图的无模式实体识别方法研究

Research on schema-agnostic entity resolution based on undirected block weighted graph

免费全文下载 (已被下载 次)  
获取PDF全文
作者 杨宁,卢菁,邵清,刘丛
机构 上海理工大学 光电信息与计算机工程学院,上海 200093
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2021)01-034-0169-06
DOI 10.19734/j.issn.1001-3695.2019.09.0526
摘要 当前利用分块进行实体识别的方案,忽略分块键权重和分块键的歧义,导致精确度较低。提出一个基于无向加权图的无模式实体识别方法,抽取数据源中的分量,利用分量信息熵和TF-IDF方法组合求取聚类分量,建立统一分块方案。通过聚类分量权重与分块键的关系,赋予每组分块键一定的权重,将该权重与边的共现频次进行相乘加权形成无向分块加权图,最后通过修剪方案进行边的修剪,从而解决了数据多分量及分块键歧义问题,提高了精确度。在七个真实数据集上的实验证明了该方法的有效性和可扩展性。
关键词 实体识别; 无模式实体; 分量权重; 聚类分量; 无向分块加权图
基金项目 国家自然科学基金青年基金项目(61703278)
本文URL http://www.arocmag.com/article/01-2021-01-034.html
英文标题 Research on schema-agnostic entity resolution based on undirected block weighted graph
作者英文名 Yang Ning, Lu Jing, Shao Qing, Liu Cong
机构英文名 School of Optical-Electrical & Computer Engineering,University of Shanghai for Science & Technology,Shanghai 200093,China
英文摘要 The blocking approach ignored the weight of block keys and ambiguity between blocking keys, leading to low accuracy. This paper proposed a schema-agnostic entity resolution method based on undirected weighted graph. The method extracted attributes from data sources, combined attribute information entropy and TF-IDF to get clustering attributes and established an unified block scheme. Through the relationship between clustering attribute weight and block key, it gave each group of block key a certain weight. The weight and the cooccurrence frequency of the edge were multiplied and weighted to form the undirected block weighted graph. Finally, it used the pruning scheme to prune the edges. The problem of multi-attribute and block key ambiguity in data was solved and increased accuracy. Experiments on seven real data sets show that the method is effective and scalable.
英文关键词 entity resolution; schema-agnostic entity; attribute weight; clustering attributes; undirected block weighted graph
参考文献 查看稿件参考文献
 
收稿日期 2019/9/2
修回日期 2019/11/8
页码 169-174
中图分类号 TP391
文献标志码 A