《计算机应用研究》|Application Research of Computers

基于双向长短时记忆单元和卷积神经网络的多语种文本分类方法

Multilingual text classification method based on bi-directional long short-term memory and convolutional neural network

免费全文下载 (已被下载 次)  
获取PDF全文
作者 孟先艳,崔荣一,赵亚慧,方明洙
机构 1.延边大学 计算机科学与技术学科 智能信息处理研究室,吉林 延吉 133002;2.延边朝鲜族自治州科技信息服务中心,吉林 延吉 133002
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2020)09-021-2669-05
DOI 10.19734/j.issn.1001-3695.2019.04.0132
摘要 针对日渐丰富的多语种文本数据,为了实现对同一类别体系下不同语种的文本分类,充分发挥多语种文本信息的价值,提出一种结合双向长短时记忆单元和卷积神经网络的多语种文本分类模型BiLSTM-CNN模型。针对每个语种,利用双向长短时记忆神经网络提取文本特征,并引入卷积神经网络进行特征优化,获得各语种更深层次的文本表示,最后将各语种的文本表示级联输入到softmax函数预测类别。在中英朝科技文献平行数据集上进行了实验验证,结果表明,该方法相比于基准方法分类正确率提高了4%,且对任一语种文本均能正确分类,具有良好的扩展性。
关键词 多语种文本分类; 长短时记忆单元; 卷积神经网络
基金项目 国家语委“十三五”科研规划项目(YB135-76)
延边大学外国语言文学世界一流学科建设科研项目(18YLPY13)
本文URL http://www.arocmag.com/article/01-2020-09-021.html
英文标题 Multilingual text classification method based on bi-directional long short-term memory and convolutional neural network
作者英文名 Meng Xianyan, Cui Rongyi, Zhao Yahui, Fang Mingzhu
机构英文名 1.Intelligent Information Processing Lab,Dept. of Computer Science & Technology,Yanbian University,Yanji Jilin 133002,China;2.Sci&Tech Information Service Centre of Yanbian Korean Autonomous Prefecture,Yanji Jilin 133002,China
英文摘要 In order to realize the text classification of different languages in the same category system and make full use of the value of multilingual text information, this paper proposed a multilanguage text classification model BiLSTM-CNN which combined bidirectional long short-term memory and convolutional neural networks. For each language, it extracted the text features through the two-way long-term memory neural network, and introduced the convolutional neural network to extract the text local information for feature optimization, so as to realize the distributed text representation of different language documents. Finally, it cascaded the text representation of each language into the softmax function prediction category. Experiments on parallel datasets of Chinese, British and Korean scientific and technological documents show that the proposed multilingual text classification model has a 4% improvement over the benchmark method, and can correctly classify any linguistic text with good expansibility.
英文关键词 multilingual text categorization; long short-term memory; convolutional neural network
参考文献 查看稿件参考文献
 
收稿日期 2019/4/10
修回日期 2019/5/30
页码 2669-2673
中图分类号 TP391.1
文献标志码 A