《计算机应用研究》|Application Research of Computers

基于条件随机场的汉语词汇特征研究

Study of Chinese lexical features based on conditional random fields

免费全文下载 (已被下载 次)  
获取PDF全文
作者 黄定琦,史晟辉
机构 北京化工大学 信息科学与技术学院,北京 100029
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2020)06-024-1724-05
DOI 10.19734/j.issn.1001-3695.2018.10.0859
摘要 汉语语言在书面表达时不具有天然分词的特性,词汇与词汇之间没有分词标记,因此在汉语文本的识别中需结合其行文的习惯及规则,即所谓的词汇特征。已有研究通常在实验中显式地标注词汇特征来提高识别效果,增加了人工处理流程,极大地加重了算法移植的工作量。研究并归纳了常用汉语语言的词汇特征,并利用条件随机场(conditional random fields,CRF)的特征提取能力,自行实现了复杂特征函数,在语料只具有简单标注的前提下,隐式地提取词汇特征,提高了识别效果。实验证明,在汉语分词中应用复杂词汇特征能有效提高识别性能,提供了在应用中提高识别算法可移植性的新思路。
关键词 条件随机场; 汉语词汇特征; 信息提取; 命名实体识别
基金项目 北京市教委资助项目(GWGJ201608)
本文URL http://www.arocmag.com/article/01-2020-06-024.html
英文标题 Study of Chinese lexical features based on conditional random fields
作者英文名 Huang Dingqi, Shi Shenghui
机构英文名 College of Information Science & Technology,Beijing University of Chemical Technology,Beijing 100029,China
英文摘要 There is no word segmentation between vocabularies in Chinese written expression, so the principle of writing(or called lexical features) is what it needs to process the segmentation of Chinese content. Former researches usually marked the lexical features into training content to improve the performance, which increased the manual processing flow and the workload of the algorithm transplantation. Based on conditional random fields(CRF) and the simple tags, this paper improved the recognition performance by concluding the lexical features of Chinese and transforming them to complicated functions which were used by CRF. Experiments show that applying complex lexical features in Chinese word segmentation can effectively improve recognition performance and provide a new way to improve the portability of recognition algorithms in applications.
英文关键词 CRF; Chinese lexical features; information extraction; named entity recognition
参考文献 查看稿件参考文献
 
收稿日期 2018/10/30
修回日期 2019/1/15
页码 1724-1728,1754
中图分类号 TP391
文献标志码 A