《计算机应用研究》|Application Research of Computers

融入双语最大名词短语的机器翻译模型

Machine translation model integrated with bilingual maximal-length noun phrase

免费全文下载 (已被下载 次)  
获取PDF全文
作者 李业刚,梁丽君,孙福振,王绍卿,于潇
机构 山东理工大学 计算机科学与技术学院,山东 淄博 255049
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2017)05-1316-05
DOI 10.3969/j.issn.1001-3695.2017.05.008
摘要 在统计机器翻译中融入有价值的句法层面的语言学知识,对于推动统计机器翻译的发展具有重要的理论意义和应用价值。提出了三种由简到繁的将双语最大名词短语融入到统计翻译模型的策略,整体翻译性能逐步上升。Method-Ⅲ采用分而治之的策略,以硬约束的方式在统计机器翻译中融入最大名词短语,并在双语最大名词短语层面上融合了短语翻译模型和层次短语模型,对翻译系统的改善最显著。所述策略显著提高了短语翻译模型的质量,在复杂长句翻译中,Method-Ⅲ的BLEU值比基于短语的基线翻译模型提高了3.03%。
关键词 统计机器翻译;短语翻译模型;最大名词短语;双语最大名词短语
基金项目 国家“973”计划资助项目(2013CB329303)
国家自然科学基金重点资助项目(61132009)
本文URL http://www.arocmag.com/article/01-2017-05-008.html
英文标题 Machine translation model integrated with bilingual maximal-length noun phrase
作者英文名 Li Yegang, Liang Lijun, Sun Fuzhen, Wang Shaoqing, Yu Xiao
机构英文名 CollegeofComputerScience&Technology,ShandongUniversityofTechnology,ZiboShandong255049,China
英文摘要 It has an important theoretical significance and application value to promote the development of statistical machine translation by studying the integration of meaningful linguistic knowledge into syntactic level effectively. This paper proposed three kinds of strategies of statistical machine translation model from easiness to complication, and these methods improved translation gradually. In these strategies, Method-Ⅲ adopted the strategy of divide and conquer and integrated maximal-length noun phrase into statistical machine translation in the way of hard constraint. These strategies improve the output of the translation system obviously. The BLEU value of Method-Ⅲ raises 3.03% compared with the baseline in translation of complex and long sentence.
英文关键词 statistical machine translation; phrase-based translation model; maximal-length noun phrase; bilingual maximal-length noun phrase
参考文献 查看稿件参考文献
  [1] Tinsley J, Hearne M, Way A. Exploiting parallel treebanks to improve phrase-based statistical machine translation[C] //Proc of the 10th International Conference on Computational Linguistics and Intelligent Text Processing. Berlin:Springer, 2009:318-331. [2] Schwartz L, Callison-Burch C, Schuler W, et al. Incremental syntactic language models for phrase-based translation[C] //Proc of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies. 2011:620-631. [3] 钱小飞. 最长名词短语识别研究[J] . 现代语文:语言研究版, 2009(7):124-126. [4] 李业刚, 黄河燕, 史树敏, 等. 多策略机器翻译研究综述[J] . 中文信息学报, 2015, 29(2):1-9. [5] 李业刚, 黄河燕, 鉴萍. 引入混合特征的最大名词短语双向标注融合算法[J] . 自动化学报, 2015, 41(7):1274-1282. [6] 李业刚, 黄河燕, 史树敏, 等. 基于双语协同训练的最大名词短语识别研究[J] . 软件学报, 2015, 26(7):1615-1625. [7] Stolcke A. SRILM:an extensible language modeling toolkit[C] // Proc of International Conference Spoken Language Processing. 2002:901-904. [8] Kneser R, Ney H. Improved backing-off for m-gram language modeling[C] //Proc of International Conference on Acoustics, Speech, and Signal Processing. 1995:181-184. [9] Chen S F, Goodman J. An empirical study of smoothing techniques for language modeling[J] . Computer Speech and Language, 1999, 13(4):359-394. [10] Papineni K, Roukos S, Ward T, et al. BLEU:a method for automatic evaluation of machine translation[C] //Proc of the 40th Annual Meeting of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2002:311-318. [11] Doddington G. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics[C] //Proc of International Conference on Human Language Technology. 2002:128-132. [12] Zhang Ying, Vogel S, Waibel A. Interpreting BLEU/NIST scores:how much improvement do we need to have a better system[C] //Proc of the 4th International Conference on Language Resources and Evaluation. 2004:2051-2054.
收稿日期 2016/3/22
修回日期 2016/5/16
页码 1316-1320
中图分类号 TP391.1
文献标志码 A