《计算机应用研究》|Application Research of Computers

基于稀疏分布式表征的英文著者姓名消歧研究

Research on English author name disambiguation based on sparse distributed representation

免费全文下载 (已被下载 次)  
获取PDF全文
作者 翟晓瑞,韩红旗,张运良,李仲
机构 中国科学技术信息研究所 富媒体数字出版内容组织与知识服务重点实验室,北京 100038
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2019)12-003-3534-05
DOI 10.19734/j.issn.1001-3695.2018.07.0380
摘要 为将稀疏分布式表征理论应用到著者姓名消歧,了解其在解决姓名消歧问题时的效果,提出了基于稀疏分布式表征的英文文献著者姓名消歧方法。该方法选择论文摘要文本信息作为消歧特征,将其生成二进制表示的SDR码。根据待消歧论文的SDR与同名作者的论文SDR相似度对比来实现著者姓名消歧。最终得到的结果为准确率98.21%,召回率76.75%,<i>F</i>值86.17%,证明提出的消歧方法具有较好的效果。通过将该方法与利用合著者特征进行消歧的方法进行对比,说明该方法能够较好地解决文献著者姓名歧义问题。此外,该方法还可将作者未收录在作者库中的论文识别出来并将其指派给新作者,无须重新学习和更新模型。
关键词 姓名消歧; 稀疏分布式表征; 语义指纹; 层级时序记忆模型
基金项目 国家自然科学基金资助项目(71473237)
中国工程科技知识中心建设项目(CKCEST-2018-1-26)
本文URL http://www.arocmag.com/article/01-2019-12-003.html
英文标题 Research on English author name disambiguation based on sparse distributed representation
作者英文名 Zhai Xiaorui, Han Hongqi, Zhang Yunliang, Li Zhong
机构英文名 Key Laboratory of Rich-media Knowledge Organization & Service of Digital Publishing Content,Institute of Scientific & Technical Information of China,Beijing 100038,China
英文摘要 In order to apply the sparse distributed representation theory to the author name disambiguation, and to know the effect of the theory in solving the name disambiguation problem, this paper proposed a method based on sparse distributed representation to disambiguate English author name. This paper selected summary as disambiguation feature and generated binary representation of SDRs. And then it constructed the similarity matrix based on the similarity comparison of the training set, it performed the experiment after the appropriate threshold set. The final accuracy is 98.21%, the recall is 76.75%, and the <i>F</i>-value is 86.17%. The result indicates that the proposed method has a good effect. By comparing the proposed method with the method based on co-authors, it can be concluded that the proposed method can better solve the ambiguity problem of author names. In addition, the method can also identify the papers whose authors are not included in the author database, and assign to new authors without relearning and updating the model.
英文关键词 name disambiguation; sparse distributed representation; semantic fingerprint; hierarchical temporal memory
参考文献 查看稿件参考文献
 
收稿日期 2018/7/18
修回日期 2018/9/13
页码 3534-3538
中图分类号 TP391.1
文献标志码 A