《计算机应用研究》|Application Research of Computers

基于CP-CNN的中文短文本分类研究

Chinese short text classification based on CP-CNN

免费全文下载 (已被下载 次)  
获取PDF全文
作者 余本功,张连彬
机构 合肥工业大学 a.管理学院;b.过程优化与智能决策教育部重点实验室,合肥 230009
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2018)04-1001-04
DOI 10.3969/j.issn.1001-3695.2018.04.009
摘要 短文本具有长度短、特征稀疏以及上下文依赖强等特点,传统方法对其直接进行分类精度有限。针对此问题,提出了一种结合字符和词的双输入卷积神经网络模型CP-CNN。该模型通过加入一种用拼音序列表征字符级输入的方法,构建字符级和词级的双输入矩阵,并在采样层使用k-max采样方法,增强模型特征的表达能力。利用豆瓣电影评论数据集对该模型进行识别精度评估,实验结果表明,与传统分类模型和标准卷积神经网络模型相比,该模型可有效提高短文本分类效果。
关键词 短文本;分类;卷积神经网络
基金项目 国家教育部人文社会科学基金资助项目(2012JYRW0710)
国家自然科学基金资助项目(71671057)
本文URL http://www.arocmag.com/article/01-2018-04-009.html
英文标题 Chinese short text classification based on CP-CNN
作者英文名 Yu Bengong, Zhang Lianbin
机构英文名 a.SchoolofManagement,b.KeyLaboratoryofProcessOptimization&IntelligentDecisionmakingofMinistryofEducation,HefeiUniversityofTechnology,Hefei230009,China
英文摘要 Since short text is characterized of the short length, sparse features and strong context dependency, the traditional models have a limited precision. Motivated by this, this paper proposed a multi-input convolutional neural network model CP-CNN. It used pinyin sequences to characterize the feature at the character level, thus to build double input matrix at the character and phrase level. It could enhance the model’s feature presentation ability by using k-max down-sampling method. The evaluations on Douban review dataset show that the proposed model outperforms the standard CNN and traditional models on short text classification.
英文关键词 short text; classification; convolutional neural network(CNN)
参考文献 查看稿件参考文献
  [1] Severyn A, Moschitti A. Learning to rank short text pairs with convolutional deep neural networks[C] //Proc of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015:373-382.
[2] Zhang Weinan, Wang Dingquan, Xue Guirong, et al. Advertising keywords recommendation for short-text Web pages using Wikipedia[J] . ACM Trans on Intelligent Systems and Technology, 2012, 3(2):1-25.
[3] Nguyen T H, Grishman R. Relation extraction:perspective from convolutional neural networks[C] //Proc of the 1st Workshop on Vector Space Modeling for Natural Language Processing. 2015:39-48.
[4] Hinton G E. Learning distributed representations of concepts[C] //Proc of the 8th Annual Conference of Cognitive Science Society. 1986:46-61.
[5] 王仲远, 程健鹏, 王海勋, 等. 短文本理解研究[J] . 计算机研究与发展, 2016, 53(2):262-269.
[6] Chen Yan, Li Zhoujun, Nie Liqiang, et al. A semi-supervised Bayesian network model for microblog topic classification[C] //Proc of International Conference on Computational Linguistics. 2012:561-576.
[7] 宁亚辉, 樊兴华, 吴渝. 基于领域词语本体的短文本分类[J] . 计算机科学, 2009, 36(3):142-145.
[8] 魏强, 金芝, 许焱. 基于概率主题模型的物联网服务发现[J] . 软件学报, 2014, 25(8):1640-1658.
[9] 何天翔, 张晖, 李波, 等. 结合情感词网的中文短文本情感分类[J] . 计算机应用研究, 2015, 32(10):2905-2909.
[10] Kim Y. Convolutional neural networks for sentence classification[C] //Empirical Methods in Natural Language Processing. 2014:1746-1751.
[11] Santos C N, Gatti M A. Deep convolutional neural networks for sentiment analysis of short texts[C] //Proc of International Conference on Computational Linguistics. 2014:69-78.
[12] Johnson R, Zhang Tong. Effective use of word order for text categorization with convolutional neural networks[C] //Proc of the North American Chapter of the Association for Computational Linguistics. 2014:103-112.
[13] Wang Peng. Semantic clustering and convolutional neural network for short text categorization[C] //Proc of Meeting of the Association for Computational Linguistics. 2015:352-357.
[14] 陈钊, 徐睿峰, 桂林, 等. 结合卷积神经网络和词语情感序列特征的中文情感分析[J] . 中文信息学报, 2015, 29(6):172-178.
[15] 刘龙飞, 杨亮, 张绍武, 等. 基于卷积神经网络的微博情感倾向性分析[J] . 中文信息学报, 2015, 29(6):159-165.
[16] Zhang Xiang, Zhao Junbo, LeCun Y. Character-level convolutional networks for text classification[C] //Proc of the 28th International Conference on Neural Information Processing Systems. Cambridge:MIT Press, 2015:649-657.
[17] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks[C] //Proc of International Conference on Artificial Intelligence and Statistics. 2011:315-323.
收稿日期 2016/12/10
修回日期 2017/2/17
页码 1001-1004
中图分类号 TP391.1
文献标志码 A