《计算机应用研究》|Application Research of Computers

基于混合方法的维吾尔语词干提取方法研究

Novel approach for Uyghur stemmer using mixed method

免费全文下载 (已被下载 次)  
获取PDF全文
作者 热娜·艾尔肯,李晓,艾尼宛尔·托乎提
机构 1.中国科学院新疆理化技术研究所,乌鲁木齐 830011;2.新疆维吾尔自治区标准化研究院,乌鲁木齐 830000;3.新疆大学 新疆多语种实验室,乌鲁木齐 830046
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2015)01-0112-03
DOI 10.3969/j.issn.1001-3695.2015.01.026
摘要 针对维吾尔语形态变化,提出了利用规则和词典相结合的混合处理方法进行形态还原技术。利用从左到右地分析和Lovin 算法实现了词干提取器。通过总结词法连接规则,提出了规则实现词干提取、用词典验证提取结果。经过对不同新闻内容的五次测试得出平均准确率达到了77.4%。
关键词 维吾尔语;形态变化;词干;词缀;规则方法;词典方法;混合方法;Lovin算法
基金项目
本文URL http://www.arocmag.com/article/01-2015-01-026.html
英文标题 Novel approach for Uyghur stemmer using mixed method
作者英文名 RANA·Arkin, LI Xiao, ANWAR·Tohti
机构英文名 1. Xinjiang Technical Institute of Physics & Chemistry, Chinese Acaderay of Sciences, Urumqi 830011, China; 2. Xinjiang Institute of Standardization, Urumqi 830000, China; 3. Key Laboratory of Multilingual Information Technology, Xinjiang University, Urumqi 830046, China
英文摘要 This paper proposed changes in morphology of Uygur language, mixed processing method using a combination of rules and dictionaries phase morphology reduction technology. And proposed rules stemming and used a dictionary method to verify the extraction results. It are performed tests on the different combination of features. Experimental results show achieves recall of 77.4%.
英文关键词 Uyghur; morphological changes; stem; affixes; rule method; dictionary method; mixed method; Lovin algorithm
参考文献 查看稿件参考文献
  [1] The Porter stemming algorithm[EB/OL] . [2014-01-25] . http://tartarus. org/martin/PorterStemmer/.
[2] The lancaster stemming algorithm[EB/OL] . [2014-01-21] . http://www. comp. lancs. ac. uk/computing/research/stemming/.
[3] The Lovins stemming algorithm[OL] . [2013-12-21] . http://snowball. tartarus. org/algorithms/lovins/stemmer. html.
[4] DAWSON J L. Suffix removal for word conflation[J] . Bulletin of the Association for Literary & Linguistic Computing, 1974, 2(3):33-46.
[5] MAYFIELD J, MCNAMEE P. Single n-gram stemming[C] //Proc of the 26th Annual International Retrieval. New York:ACM Press, 2003:415-416.
[6] MELUCCI M, ORIO N. A novel method for stemmer generation based on hidden Markov models[C] //Proc of the 12th International Conference on Information and Knowledge Management. New York:ACM, 2003:131-138.
[7] AISHA B, SUN Ma-song. A statistical method for uyghur tokenization[C] //Proc of IEEE International Conference on NLP-KE. 2009:383-387.
[8] AISHAN W, TUERGEN Y, ZAOKERE K. Shengwei tian conditional random fields combined FSM stemming method for uyghur proceeding[C] //Proc of the 2nd IEEE International Confrence on Computer and Information Technology. 2009:295-299.
[9] 早克热·卡德尔, 艾山·吾买尔, 吐尔根·依布拉音, 等. 维吾尔语名词构形词缀有限状态自动机的构造[J] . 中文信息学报, 2009, 23(6):116-121.
[10] 阿依克孜·卡德尔, 开沙尔·卡德尔, 吐尔根·依布拉音. 面向自然语言信息处理的维吾尔语名词形态分析研究[J] . 中文信息学报, 2006(3):43-48.
[11] 司马义·阿不都热依木. 现代维吾尔语造词法研究[D] . 乌鲁木齐:新疆大学, 2006.
收稿日期 2014/1/21
修回日期 2014/3/31
页码 112-114,120
中图分类号 TP391.1
文献标志码 A