《计算机应用研究》|Application Research of Computers

基于层次特征的藏文人名识别研究

Research on recognition of Tibetan names based on hierarchical features

免费全文下载 (已被下载 次)  
获取PDF全文
作者 刘飞飞,王志娟
机构 1.中央民族大学 信息工程学院,北京 100081;2.国家语言资源监测与研究中心 少数民族语言分中心,北京 100081
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2018)09-2583-05
DOI 10.3969/j.issn.1001-3695.2018.09.005
摘要 为了提高藏文人名识别的效果,提出了结合三层的层次特征的藏文人名识别算法。提出了无须分词,仅在藏文音节粒度上,基于藏文人名三层特征:内部特征、上下文信息、并列关系特征,利用条件随机场(conditional random fields,CRF)算法进行藏文人名识别的研究。首先将人名的内部和上下文特征作为CRF特征,然后将人名并列关系特征设计为规则进一步提高识别效果。在不影响准确率的情况下,最终将人名识别的召回率提高了10.43%,综合F值达到了95.02%。其中对于藏族人名的F值提升了11%,音译人名识别的F值达到了94.09%。实验结果表明,该方法可以有效提升藏文人名的识别效果。
关键词 人名识别;层次特征;藏文;条件随机场
基金项目 国家自然科学基金重点资助项目(61331013)
国家语委科研项目(WT125-46)
中央民族大学一流大学一流学科研究生自主科研项目(10301-0170040601-184)
本文URL http://www.arocmag.com/article/01-2018-09-005.html
英文标题 Research on recognition of Tibetan names based on hierarchical features
作者英文名 Liu Feifei, Wang Zhijuan
机构英文名 1.SchoolofInformationEngineering,MinzuUniversityofChina,Beijing100081,China;2.NationalLanguageResourceMonitoring&ResearchCenterofMinorityLanguages,Beijing100081,China
英文摘要 In order to improve the effect of Tibetan name recognition, this paper designed the algorithm based on three levels of hierarchical features.It proposed a three-layer feature, which was based on the Tibetan character name without word- segmentation.The three-layer feature included internal features, the context information and the parallel relations feature.It used the conditional random fields (CRF) algorithm to identify the Tibetan name research.First, it considered the internal and context characteristics of the name as a CRF feature, and then considered the relationship between names as the rule to further improved the recognition effect.The recall was increased 10.43% and the F-value will reach 95.02%.Experiment shows that the method achieves a very good effect for recognition of Tibetan names.
英文关键词 recognition of names; hierarchical features; Tibetan; conditional random field
参考文献 查看稿件参考文献
  [1] Nadeau D, Sekine S. A survey of named entity recognition and classification[J] . Journal of Linguisticae Investigationes, 2007, 30(1):3-26.
[2] Yu Hongzhi, Jiang Tao, Ma Ning. Named entity recognition for Tibetan texts using case-auxiliary grammars[C] //Lecture Notes in Engineering & Computer Science. 2010.
[3] Sun Yuan, Yan Xiaodong, Zhao Xiaobing, et al. Research on automatic recognition of Tibetan personal names based on multi-features[C] //Proc of the 6th International Conference on Natural Language Processing and Knowledge Engineering. Piscataway, NJ:IEEE Press, 2010:1-5.
[4] 窦嵘, 加羊吉, 黄伟. 统计与规则相结合的藏文人名自动识别研究[J] . 长春工程学院学报:自然科学版, 2010, 11(2):113-115.
[5] 康才畯, 龙从军, 江荻. 基于条件随机场的藏文人名识别研究[J] . 计算机工程与应用, 2015, 51(3):109-111.
[6] 加羊吉, 李亚超, 宗成庆, 等. 最大熵和条件随机场模型相融合的藏文人名识别[J] . 中文信息学报, 2014, 28(1):107-112.
[7] 华却才让, 姜文斌, 赵海兴, 等. 基于感知机模型藏文命名实体识别[J] . 计算机工程与应用, 2014, 50(15):172-176.
[8] 周昆. 基于规则的命名实体识别研究[D] . 合肥:合肥工业大学, 2010.
[9] Alfred R, Leong L C, On C K, et al. Malay named entity recognition based on rule-based approach[J] . International Journal of Machine Learning & Computing, 2014, 4(3):300-306.
[10] Biswas S, Mohanty S, Mishra S P. A hybrid oriya named entity recognition system:integrating HMM with MaxEnt[C] //Proc of the 2nd International Conference on Emerging Trends in Engineering & Technology. Washington DC:IEEE Computer Society, 2009:639-643.
[11] 俞鸿魁, 张华平, 刘群, 等. 基于层叠隐马尔可夫模型的中文命名实体识别[J] . 通信学报, 2006, 27(2):87-94.
[12] Chieu H L, Ng H T. Named entity recognition with a maximum entropy approach[C] //Proc of the 7th Conference on Natural Language Learning at HLT-NAACL. Stroudsburg, PA:Association for Computational Linguistics, 2003:160-163.
[13] Benajiba Y, Diab M, Rosso P. Arabic named entity recognition:an SVM-based approach[C] //Proc of International Arab Conference on Information Technology. 2009.
[14] McCallum A, Li Wei. Early results for named entity recognition with conditional random fields, feature induction and Web-enhanced lexicons[C] //Proc of the 7th Conference on Natural Language Learning at HLT-NAACL. Stroudsburg, PA:Association for Computational Linguistics, 2003:188-191.
[15] Zhang Yuejie, Xu Zhiting, Zhang Tao. Fusion of multiple features for Chinese named entity recognition based on CRF model[C] //Proc of the 4th Asia Information Retrieval Conference on Information Retrieval Technology. Berlin:Springer-Verlag, 2008:95-106.
[16] Chiu J P C, Nichols E. Named entity recognition with bidirectional LSTM-CNNs[EB/OL] . (2016-07-19). https://arxiv. org/abs/1511. 08308v5.
[17] Chen Wenliang, Zhang Yujie, Isahara H. Chinese named entity recognition with conditional random fields[Z] . 2006:118-121.
[18] Liao Wenhui, Veeramachaneni S. A simple semi-supervised algorithm for named entity recognition[C] //Proc of the NAACLHLT Workshop on Semi-Supervised Learning for Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2009:58-65.
[19] Nadeau D. Semi-supervised named entity recognition:learning to recognize 100 entity types with little supervision[D] . Ontario:University of Ottawa, 2007.
[20] Shinyama Y, Sekine S. Named entity discovery using comparable news articles[C] //Proc of International Conference on Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics, 2004:848.
[21] Lample G, Ballesteros M, Subramanian S, et al. Neural architectures for named entity recognition[EB/OL] . (2016-04-07). https://arxiv:org/abs/1603. 01360.
[22] Marsh E, Perzanowski D. MUC-7 evaluation of IE technology:overview of results[C] //Proc of the 7th Message Understanding Conference. 1998:20.
[23] Dong Chuanhai, Zhang Jiajun, Zong Chengqing, et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[M] //Natural Language Understanding and Intelligent Applications. Berlin:Springer International Publishing, 2016:239-250.
[24] Santos C N D, Guimares V. Boosting named entity recognition with neural character embeddings[EB/OL] . (2015-07-31). https://arxiv. org/abs/1505. 05008.
[25] Pham Q H, Nguyen M L, Nguyen B T, et al. Semi-supervised learning for vietnamese named entity recognition using online conditional random fields[C] //Proc of the 7th ACL-IJCNLP Named Entities Workshop. 2015.
[26] Manamini S A P M, Ahamed A F, Rajapakshe R A E C, et al. Ananya:a named-entity-recognition (NER) system for Sinhala language[C] //Proc of Moratuwa Engineering Research Conference. Piscataway, NJ:IEEE Press, 2016:30-35.
[27] Nongmeikapam K, Shangkhunem T, Chanu N M, et al. CRF based name entity recognition (NER) in Manipuri:a highly agglutinative Indian language[C] //Proc of the 2nd National Conference on Emerging Trends and Applications in Computer Science. Piscataway, NJ:IEEE Press, 2011:1-6.
[28] Konkol M, Konopík M. CRF-based czech named entity recognizer and consolidation of czech NER research[C] //Proc of International Conference on Text, Speech, and Dialogue. Berlin:Springer, 2013:153-160.
[29] Agerri R, Rigau G. Robust multilingual named entity recognition with shallow semi-supervised features[J] . Artificial Intelligence, 2016, 238(9):63-82.
[30] 金明, 杨欢欢, 单广荣. 藏语命名实体识别研究[J] . 西北民族大学学报:自然科学版, 2010, 31(3):49-52.
[31] 李捷译注. 百家姓[M] . 呼和浩特:远方出版社, 2007.
[32] Lafferty J D, McCallum A, Pereira F, et al. Conditional random fields:probabilistic models for segmenting and labeling sequence data[C] //Proc of the 18th International Conference on Machine Learning. 2001:282-289.
收稿日期 2017/5/5
修回日期 2017/6/27
页码 2583-2587,2596
中图分类号 TP391.1
文献标志码 A