《计算机应用研究》|Application Research of Computers

基于BLSTM_attention_CRF模型的新能源汽车领域术语抽取

Terminology extraction for new energy vehicle based on BLSTM_attention_CRF model

免费全文下载 (已被下载 次)  
获取PDF全文
作者 马建红,张亚梅,姚爽,张炳斐,郭昌宏
机构 河北工业大学 计算机科学与软件学院,天津 300401
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2019)05-022-1385-05
DOI 10.19734/j.issn.1001-3695.2017.11.0741
摘要 为提高新能源汽车领域术语抽取准确率,面向新能源汽车专利文本提出一种领域术语抽取模型。传统的领域术语抽取方法过度依赖人工定义特征和领域知识,无法自动挖掘隐含特征,其识别性能过度依赖所选特征的质量。从深度学习的角度出发,提出了一种基于attention的双向长短时记忆网络(bidirectional long short-term memory,BLSTM)与条件随机场(conditional random fields,CRF)相结合的领域术语抽取模型(BLSTM_attention_CRF模型),并使用基于词典与规则相结合的方法对结果进行校正,准确率可达到86%以上,方法切实可行。
关键词 领域术语抽取; attention机制; 双向长短时记忆网络; 条件随机场; 词典; 规则
基金项目
本文URL http://www.arocmag.com/article/01-2019-05-022.html
英文标题 Terminology extraction for new energy vehicle based on BLSTM_attention_CRF model
作者英文名 Ma Jianhong, Zhang Yamei, Yao Shuang, Zhang Bingfei, Guo Changhong
机构英文名 School of Computer Science & Software,Hebei University of Technology,Tianjin 300401,China
英文摘要 In order to improve the accuracy and recall rate of terminology extraction results in the field of new energy vehicles, this paper presented a domain terminology extraction model for the new energy vehicles patent text. Traditional domain terminology extraction methods rely too much on human-defined features and specialized domain knowledge to automatically mine implicit features whose recognition performance greatly depends on the quality of the selected features. In order to solve the problems, this paper proposed a model from the perspective of deep learning. Firstly, it extracted the domain terms by a combination of BLSTM(bidirectional long short-term memory) model based on the attention mechanism and CRF(conditional random fields) model(BLSTM_attention_CRF model), and then it corrected the result by a combination of dictionary and rules. Experimental results show that the accuracy of BLSTM-ATT-CRF model can reach more than 86%, which shows that BLSTM-ATT-CRF model is effective to term extraction of new energy vehicles.
英文关键词 domain term extraction; attention mechanism; bidirectional long short-term memory; conditional random fields; dictionary; rules
参考文献 查看稿件参考文献
 
收稿日期 2017/11/15
修回日期 2018/1/22
页码 1385-1389,1395
中图分类号 TP391
文献标志码 A