《计算机应用研究》|Application Research of Computers

基于三元组特征和词向量技术的中文专利侵权检测研究

Infringement detection of Chinese patent based on three tuple character and word embedding

免费全文下载 (已被下载 次)  
获取PDF全文
作者 金健,朱玉全,陈耿
机构 1.江苏大学 计算机科学与通信工程学院,江苏 镇江 212013;2.南京审计大学 工学院,南京 211815
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2017)10-2901-04
DOI 10.3969/j.issn.1001-3695.2017.10.005
摘要 针对中文专利侵权检测中关键词特征表达能力弱以及句子结构特征容易引起噪声干扰的问题,提出了一种通过抽取三元组特征来改进中文专利侵权检测的方法。该方法将专利权利要求书抽取为三元组特征的集合,并结合词向量技术和HowNet计算三元组特征间的语义相似度,从而有效提高对疑似侵权专利的识别能力。实验结果表明,该方法取得了较好的检测效果,且在准确率上要高于其他方法。
关键词 专利侵权;信息抽取;词向量;相似度计算;文本处理
基金项目 国家自然科学基金资助项目(71271117)
江苏省六大人才高峰项目(2013-WLW-005)
江苏省自然科学基金资助项目(BK20150531)
本文URL http://www.arocmag.com/article/01-2017-10-005.html
英文标题 Infringement detection of Chinese patent based on three tuple character and word embedding
作者英文名 Jin Jian, Zhu Yuquan, Chen Geng
机构英文名 1.SchoolofComputerScience&CommunicationEngineering,JiangsuUniversity,ZhenjiangJiangsu212013,China;2.SchoolofTechnology,NanjingAuditUniversity,Nanjing211815,China
英文摘要 Because the expression ability of keywords features are weak and the structural features of the sentence are easy to cause the problem of noise interference, this paper proposed a method of improving Chinese patent infringement detection by extracting the three tuple features of the claim. This method extracted the patent claim as a set of three tuple, and calculated the similarity between the three tuple features by combining word embedding and HowNet, which could effectively improve the ability to identity the suspected patent infringement. Experimental results show that the proposed method has good detection results, and the accuracy is higher than other methods.
英文关键词 patent infringement; information extraction; word embedding; similarity computation; text processing
参考文献 查看稿件参考文献
  [1] Lee C Y, Song B M, Park Y T. How to assess patent infringement risks:a semantic patent claim analysis using dependency relationships[J] . Technology Analysis & Strategic Management, 2013, 25(1):23-38.
[2] Park H, Yoon J, Kim K. Identifying patent infringementusing SAO based semantic technological similarities[J] . Scientometrics, 2012, 90(2):515-529.
[3] Cheng Tienyuan, Wang Mingtzong. The patent classification technology/function matrix:a systematic method for design around[J] . Journal of Intellectual Property Rights, 2013, 18(3):158-167.
[4] Indukuri K V, Ambekar A A, Sureka A. Similarity analysis of patent claims using natural language processing techniques[C] //Proc of International Conference on Computational Intelligence and Multimedia Applications. Washington DC:IEEE Computer Society, 2007:169-175.
[5] 马文姗. 中文专利侵权检索模型研究[D] . 北京:北京工业大学, 2012.
[6] 武玉英, 马羽翔, 翟东升. 基于SOM的中文专利侵权检测研究[J] . 情报杂志, 2014, 33(2):33-39.
[7] Matt K, Sun Yu, Nicholas K, et al. From word embeddings to document distances[C] //Proc of the 32nd International Conference on Machine Learning. 2015:957-966.
[8] 张利. 发明和实用新型专利侵权判定原则适用模型研究[M] //专利法研究. 北京:知识产权出版社, 2006:306-328.
[9] Mikolov T, Sutskever I, Chen Kai, et al. Distributed representations of words and phrases and their compositionality[C] //Advances in Neural Information Processing Systems. 2013:3111-3119.
[10] 王苑. 基于依存树的中文命名实体语义关系抽取的研究[D] . 长沙:中南大学, 2009.
[11] 刘群, 李素建. 基于《知网》的词汇语义相似度计算[J] . 中文计算语言学, 2002(7):59-76.
[12] Achananuparp P, Hu Xiaohua, Shen Xiajiong. The evaluation of sentence similarity measures[C] //Data Warehousing and Knowledge Discovery. Berlin:Springer, 2008:305-316.
[13] 庄成龙, 钱龙华, 周国栋. 基于树核函数的实体语义关系抽取方法研究[J] . 中文信息学报, 2009, 23(1):3-8.
[14] Che Wanxiang, Li Zhenghua, Liu Ting. LTP:a Chinese language technology platform[C] //Proc of International Conference on Computational Linguistics, Demonstrations. 2010:13-16.
[15] Bunescu R C, Mooney R J. A shortest path dependency kernel for relation extraction[C] //Proc of Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2005:724-731.
收稿日期 2016/7/20
修回日期 2016/9/9
页码 2901-2904
中图分类号 TP391.1
文献标志码 A