《计算机应用研究》|Application Research of Computers

汉语语音识别中融合发音信息的随机段模型研究

Integrating articulatory information into stochastic segment models for continuous mandarin speech recognition

免费全文下载 (已被下载 次)  
获取PDF全文
作者 晁浩,刘志中,薛霄
机构 河南理工大学 计算机科学与技术学院,河南 焦作 454000
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2015)04-1087-04
DOI 10.3969/j.issn.1001-3695.2015.04.030
摘要 提出了一种基于随机段模型的发音信息集成方法。根据随机段模型的模型特性,建立了阶层式人工神经网络来获取语音段信号属于各类音素的后验概率,并通过一遍解码的方式集成到随机段模型系统中。在“863-test”测试集上进行的汉语连续语音识别实验显示汉语字的相对错误率下降了5.93%。实验结果表明了将发音信息应用到随机段模型的可行性。
关键词 语音识别;随机段模型;发音信息;阶层式人工神经网路;发音特征
基金项目 国家自然科学基金资助项目(91120303,90820303,90820011)
河南省基础与前沿技术研究计划资助项目(132300410332)
本文URL http://www.arocmag.com/article/01-2015-04-030.html
英文标题 Integrating articulatory information into stochastic segment models for continuous mandarin speech recognition
作者英文名 CHAO Hao, LIU Zhi-zhong, XUE Xiao
机构英文名 School of Computer Science & Technology, Henan Polytechnic University, Jiaozuo Henan 454000, China
英文摘要 This paper proposed a framework which attempted to incorporate articulatory information into the stochastic segment model based mandarin speech recognition system. According to the characteristics of the stochastic segment model, it used hierarchical artificial neural network to obtain the posteriors of speech signal belonging to the phonemes. Then, the posteriors were integrated into the stochastic segment model system in the first search process. Experiments conducted on “863-test” set show that about 5.93% relative improvement can be achieved in the recognition accuracy. Thus, potential of the method is demonstrated.
英文关键词 speech recognition; stochastic segment model; articulatory information; hierarchical artificial neural network; articulatory feature
参考文献 查看稿件参考文献
  [1] MITRA V, HOSUNG N, ESPY-WILSON C Y, et al. Articulatory information for noise robust speech recognition[J] . IEEE Trans on Audio, Speech, and Language Processing, 2011, 19(7):1913-1924.
[2] RASIPURAM R, MAGIMAI-DOSS M. Integrating articulatory features using Kullback-Leibler divergence based acoustic model for phoneme recognition[C] //Proc of the 36th IEEE International Conference on Acoustics, Speech and Signal Processing. [S. l. ] :IEEE Press, 2011:5192-5195.
[3] RUDZICZ F. Articulatory knowledge in the recognition of dysarthric speech[J] . IEEE Trans on Audio, Speech, and Language Processing, 2011, 19(4):947-960. [4] LIVESCU K, CETIN O, HASEGAWA-JOHNSON M, et al. Articulatory feature-based methods for acoustic and audio-visual speech recognition:summary from the 2006 JHU summer workshop[C] //Proc of the 32nd IEEE International Conference on Acoustics, Speech and Signal Processing. [S. l. ] :IEEE Press, 2007:621-624.
[5] KIRCHHOFF K. Robust speech recognition using articulatory information[D] . Germany:University of Bielefeld, 1999.
[6] GRAVIER G, MORARU D. Towards phonetically driven hidden Markov models:can we incorporate phonetic landmarks in HMM-based ASR?[C] //Advances in Nonlinear Speech Processing. [S. l. ] :ISCA, 2007:161-168.
[7] KIRCHHOFF K, FINK G A, SAGERER G. Combining acoustic and articulatory feature information for robust speech recognition[J] . Speech Communication, 2002, 37(3-4):303-319.
[8] SINISCALCHI S M, SVENDSEN T, LEE C H. A phonetic feature based lattice rescoring approach to LVCSR[C] //Proc of the 34th IEEE International Conference on Acoustics, Speech and Signal Processing. [S. l. ] :IEEE Press, 2009:3865-3868.
[9] KIMBALL O, OSTENDORF M, BECHWATI I. Context modeling with the stochastic segment model[J] . IEEE Trans on Signal Processing, 1992, 40(6):1584-1587.
[10] 唐赟, 刘文举, 徐波. 基于后验概率解码段模型的汉语语音数字串识别[J] . 计算机学报, 2006, 29(4):635-642.
[11] TANG Yun, LIU Wen-ju, ZHANG Hua. One-pass coarse-to-fine segmental speech decoding algorithm[C] //Proc of the 31st IEEE International Conference on Acoustics, Speech and Signal Processing. [S. l. ] :IEEE Press, 2006:441-444.
[12] 张华, 刘文举, 徐波. Research on adaptive step decoding in segment-based LVCSR[C] //自然语言处理与知识工程国际会议. 2007:463-467.
[13] 张晴晴, 潘接林, 颜永红. 基于发音特征的汉语普通话语音声学建模[J] . 声学学报, 2010, 35(2):254-260.
[14] PAPCUN J, HOCHBER T R, THOMAS F, et al. Inferring articulation and recognizing gestures from acoustics with a neural network trained on X-ray microbeam data[J] . Journal of the Acoustical Society of America, 1992, 92(2):688-700.
[15] SCHROETER J, SONDHI M M. Techniques for estimating vocal tract shapes from the speech signal[J] . IEEE Trans on Signal Processing, 1994, 2(1):133-150.
收稿日期 2014/3/4
修回日期 2014/4/18
页码 1087-1090
中图分类号 TP391.42
文献标志码 A