《计算机应用研究》|Application Research of Computers

基于sentence2vec与半监督算法的中文问答提问模式抽取

Question pattern extraction based on sentence2vec and semi-supervised algorithm for Chinese Q&A

免费全文下载 (已被下载 次)  
获取PDF全文
作者 张金壬,章韵,王宇
机构 南京邮电大学 计算机学院、软件学院、网络空间安全学院,南京 210003
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2019)07-027-2035-05
DOI 10.19734/j.issn.1001-3695.2018.01.0020
摘要 关系抽取是信息抽取中的一项重要任务。在处理问答对形式的文本时,除了文本中实体间的关系抽取之外,作为连接问句与答句之间关系的提问模式同样需要抽取。通过有监督的标注算法(条件随机场)与基于模板元组自举的半监督算法的结合在抽取实体间关系时有不错的表现,但传统半监督中发现句式模板的方式难以迁移到提问模式抽取中。针对这种情况,通过引入句向量计算文本相似度并选取句式模板,提出一种基于sentence2vec技术与半监督算法结合的模型。对于最终实验,采用随机抽样进行验证。实验结果表明,相较于传统的半监督算法,该方法得到了更高的准确率和召回率。
关键词 关系抽取; 提问模式; 条件随机场; 自举; 句向量
基金项目
本文URL http://www.arocmag.com/article/01-2019-07-027.html
英文标题 Question pattern extraction based on sentence2vec and semi-supervised algorithm for Chinese Q&A
作者英文名 Zhang Jinren, Zhang Yun, Wang Yu
机构英文名 School of Computer Science,Nanjing University of Posts & Telecommunications,Nanjing 210003,China
英文摘要 Relation extraction is an important task in information extraction. While dealing with the question-answer pairs, in addition to the relations among the entities in the texts, the question pattern as the relation connected questions and the answers also needs to be extracted. The combination of the supervised labeling algorithm(conditional random field) and the semi-supervised algorithm based on a feature template(Bootstrapping) has a good performance when extracting relationships between entities. However, the method to find the template in the traditional semi-supervised algorithm is hard to move to the extraction of the question pattern. Therefore, this paper proposed model based on the combination of sentence2vec technology and semi-supervised algorithm, which introduced the sentence vector to calculate the text similarity and selected the sentence template. It used random sampling validation to verify the final result. The experimental results show that the method has higher precision and recall values than the traditional semi-supervised algorithm.
英文关键词 relation extraction; question pattern; conditional random field(CRF); Bootstrapping; sentence2vec
参考文献 查看稿件参考文献
 
收稿日期 2018/1/10
修回日期 2018/3/9
页码 2035-2039
中图分类号 TP391
文献标志码 A