《计算机应用研究》|Application Research of Computers

基于有效上下文信息的变体词还原方法

Morph resolution based on effective context information

免费全文下载 (已被下载 次)  
获取PDF全文
作者 游绩榕,沙灜,梁棋,王斌
机构 1.中国科学院信息工程研究所 第二研究室,北京 100093;2.中国科学院大学 网络空间安全学院,北京 100049
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2019)06-029-1737-04
DOI 10.19734/j.issn.1001-3695.2018.01.0033
摘要 在社交网络上,用户常创造一些变体词来替代部分实体名词,将这些变体词还原为原目标词是自然语言处理中的一项重要工作。针对现有变体词还原方法准确率不够高的问题,提出了基于有效上下文信息的变体词还原方法。该方法利用点互信息抽取出变体词和候选目标词的有效上下文信息,并将其融合进自编码器模型中,获得变体词和候选目标词更准确的编码,并依据此计算相似度进行候选目标词排序,更准确地实现了变体词还原任务。实验表明,该方法较当前主流的几种方法相比效果有显著提升,提高了变体词还原的准确率。
关键词 变体词; 变体词还原; 自编码器; 有效上下文信息; 词嵌入; 神经网络
基金项目 科技部“十一五”科技支撑计划资助项目(2017YFB0803301)
本文URL http://www.arocmag.com/article/01-2019-06-029.html
英文标题 Morph resolution based on effective context information
作者英文名 You Jirong, Sha Ying, Liang Qi, Wang Bin
机构英文名 1.Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China;2.School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100049,China
英文摘要 In social networks, people often creates morphs to replace some entity names. How to resolve these morphs to their real target entities is a very important task for natural language processing. In order to overcome the shortcomings that existing methods cannot resolve morphs accurately, this paper proposed a morph resolution method based on effective context information. This method extracted the effective context information of morphs and target candidates, and integrated the effective context information into autoencoders in order to get more accurate embedding of morphs and their target candidates. This method then calculated the similarity between morphs and target candidates based on the accurate embeddings, and ranked the target candidates according to the similarity. The experiments show that this approach significant outperforms the state-of-the-art methods and improves the accuracy of morph resolution.
英文关键词 morph; morph resolution; autoencoder; effective context information; word embedding; neural network
参考文献 查看稿件参考文献
 
收稿日期 2018/1/18
修回日期 2018/3/1
页码 1737-1740,1747
中图分类号 TP391
文献标志码 A