《计算机应用研究》|Application Research of Computers

面向实体链接的多特征图模型实体消歧方法

Entity disambiguation method based on multi-feature fusion graph model for entity linking

免费全文下载 (已被下载 次)  
获取PDF全文
作者 高艳红,李爱萍,段利国
机构 1.太原理工大学 计算机科学与技术学院,太原 030024;2.武汉大学 软件工程国家重点实验室,武汉 430072
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2017)10-2909-06
DOI 10.3969/j.issn.1001-3695.2017.10.007
摘要 实体链接技术是将文本中的实体指称表述项正确链接到知识库中实体的过程,其中命名实体消歧的准确性直接影响实体链接的准确性。针对中文实体链接中命名实体的消歧,提出一种融合多种特征的解决方案。首先,以中文维基百科为知识库支撑,从实体指称表述项的上下文和候选实体在维基百科的内容描述两个方面抽取多种语义特征并计算语义相似度;然后将语义相似度融合到构建的图模型中,基于PageRank算法计算该图模型的最终平稳分布;最后对候选实体排序,选取top1实体作为消歧后的实体链接结果。实验通过与仅围绕名称表述特征进行消歧的基线系统相比,F值提升了9%,并且高于其他实体链接技术实验的F值,表明该方法在解决中文实体链接技术的命名实体消歧问题上取得了较好的整体效果。
关键词 中文实体链接;实体消歧;语义特征;图模型
基金项目 国家自然科学基金资助项目(61572345)
本文URL http://www.arocmag.com/article/01-2017-10-007.html
英文标题 Entity disambiguation method based on multi-feature fusion graph model for entity linking
作者英文名 Gao Yanhong, Li Aiping, Duan Liguo
机构英文名 1.SchoolofComputerScience&Technology,TaiyuanUniversityofTechnology,Taiyuan030024,China;2.StateKeyLaboratoryofSoftwareEngineering,WuhanUniversity,Wuhan430072,China
英文摘要 Entity linking is the task of linking name mention in a document with their referent entities in a knowledge base. The accuracy of the named entity disambiguation affects the accuracy of the entity linking directly. According to the named entity disambiguation in the technology of Chinese entity linking, this paper proposed a disambiguation method based on multi-feature fusion. Firstly, it used the Chinese Wikipedia as the knowledge base. It made full use of Wikipedia’s rich structural information, such as the abstract, the category, the ambiguity page, the anchor text, and so on. After that, it extracted varieties of the semantic features to measure the semantic similarities between the context of entity mention and the information of the candidate entities in Wikipedia. And then, it modeled a graph which represented the relationship between the name mention and the candidate entities with these similarities. At last, it used the PageRank algorithm to rank the candidate entities and chose the top1 entity as a result of the entity linking. Compared with the baseline system which focused on expression characteristics of the name mentions, the value of F increased by 9%. The proposed approach can improve the entity linking system’s performance.
英文关键词 Chinese entity linking; entity disambiguation; semantic features; graph model
参考文献 查看稿件参考文献
  [1] 舒佳根, 惠浩添, 钱龙华, 等. 一个中文实体链接语料库的建设[J] . 北京大学学报:自然科学版, 2015, 51(2):321-328.
[2] 张涛, 刘康, 赵军. 一种基于图模型的维基概念相似度计算方法及其在实体链接系统中的应用[J] . 中文信息学报, 2015, 29(2):58-68.
[3] 左乃彻. 基于中英文维基百科的命名实体消歧[D] . 北京:北京邮电大学, 2014.
[4] 谭咏梅, 杨雪. 结合实体链接与实体聚类的命名实体消歧[J] . 北京邮电大学学报, 2014, 37(5):36-40.
[5] 杨光, 刘秉权, 刘铭. 基于图方法的命名实体消歧[J] . 智能计算机与应用, 2015, 5(5):52-56.
[6] 郭宇航, 秦兵, 刘挺, 等. 实体链指技术研究进展[J] . 智能计算机与应用, 2014, 4(5):9-13.
[7] 陈万礼, 咎红英, 吴泳钢. 基于多源知识和Ranking SVM的中文微博命名实体链接[J] . 中文信息学报, 2015, 29(5):117-124.
[8] Milne D, Witten I H. Learning to link with Wikipedia[C] //Proc of the 17th ACM Conference on Information and Knowledge Management. New York:ACM Press, 2008:509-518.
[9] Shen Wei, Wang Jianyong, Luo Ping, et al. LINDEN:linking named entities with knowledge base via semantic knowledge[C] //Proc of the 21st International Conference on WWW. New York:ACM Press, 2014:449-458.
[10] 怀宝兴, 宝腾飞, 祝恒书, 等. 一种基于概率主题模型的命名实体链接方法[J] . 软件学报, 2014, 25(9):2076-2087.
[11] Piccinno F, Ferragina P. From TagME to WAT:a new entity annotator[C] //Proc of the 1st International Workshop on Entity Recognition & Disambiguation. New York:ACM Press, 2014.
[12] Guo Zhaochen, Barbosa D. Robust entity linking via random walks[C] //Proc of the 23rd International Conference on Information and Knowledge Management. New York:ACM Press, 2014:499-508.
[13] Dalvi B, Minkov E, TalukdarP P, et al. Automatic gloss finding for a knowledge base using ontological constraints[C] //Proc of the 8th ACM International Conference on Web Search and Data Mining. New York:ACM Press, 2015:277-285.
收稿日期 2016/7/12
修回日期 2016/9/5
页码 2909-2914
中图分类号 TP391.1
文献标志码 A