《计算机应用研究》|Application Research of Computers

基于SVM和CRF多特征组合的微博情感分析

Sentiment analysis of micro-blog based on SVM and CRF using various combinations of features

免费全文下载 (已被下载 次)  
获取PDF全文
作者 李婷婷,姬东鸿
机构 武汉大学 计算机学院,武汉 430000
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2015)04-0978-04
DOI 10.3969/j.issn.1001-3695.2015.04.004
摘要 近年来,文本的情感分析一直都是自然语言处理领域所研究的热点问题;微博作为一种短文本,用词精炼而简洁,富含观点、倾向和态度。因此,识别微博的情感倾向具有重要的现实意义。提出一种基于SVM和CRF的情感分析方法,使用多种文本特征,包括词、词性、情感词、否定词、程度副词和特殊符号等,并选用不同的特征组合,通过多组实验使情感分析效果最优。实验显示,选用词性、情感词和否定词的特征组合时,SVM模型的正确率达到88.72%,选用情感词、否定词、程度副词和特殊符号的特征组合时,CRF模型的正确率达到9044%。
关键词 微博;情感分析;支持向量机;条件随机场
基金项目 国家自然科学基金重点项目(61133012)
国家自然科学基金面上项目(61173062)
本文URL http://www.arocmag.com/article/01-2015-04-004.html
英文标题 Sentiment analysis of micro-blog based on SVM and CRF using various combinations of features
作者英文名 LI Ting-ting, JI Dong-hong
机构英文名 School of Computer, Wuhan University, Wuhan 430000, China
英文摘要 In recent years, the text sentiment analysis has always been a hot issue in the field of natural language processing. As a short text, micro-blog is featured of refined and concise, rich in views, tendencies and attitudes. Thus, the identification of emotional tendencies has important practical significance. This paper proposed a method of sentiment analysis based on SVM and CRF, used various features including word, speech, emotional word, negative word, adverb of degree and special symbols. They designed different combinations of features to make the effect optimal through multiple sets of experiments. The accuracy of SVM reached 88.72% using the combination of speech, sentiment word and negative word, while CRF attained 9044% selecting the combination of sentiment word, negative word, adverb of degree and special symbols.
英文关键词 micro-blog; sentiment analysis; SVM ; CRF
参考文献 查看稿件参考文献
  [1] BALAHUR A, STEINBERGER R, KABADJOV M, et al. Sentiment analysis in the news[J] . Infrared Physics and Technology, 2014, 65:94-102.
[2] JIANG Long, YU Mo, ZHOU Ming, et al. Target-dependent twitter sentiment classification[C] // Proc of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Techno-logies . 2011.
[3] 王金刚, 于潇, 宋丹丹, 等. 基于中文bag-of-opinions方法的微博情感分析[C] //NLP&CC. 2012.
[4] PAK A, PAROUBEK P. Twitter as a corpus for sentiment analysis and opinion mining[C] //Proc of International Conference on Language Resources and Evaluation. 2010.
[5] TABOADA M, BROOKE J, TOFILOSKI M, et al. Lexicon-based methods for sentiment analysis[J] . Computational Linguistics, 2011, 37(2):267-307.
[6] 谢丽星, 周明, 孙茂松. 基于层次结构的多策略中文微博情感分析和特征抽取[J] . 中文信息学报, 2012, 26(1):73-83.
[7] LUCIANO B, FENG Jun-lan. Robust sentiment detection on twitter from biased and noisy data[C] // Proc of the 23rd International Conference on Computational Linguistics. 2010.
[8] 李寿山, 黄居仁. 基于Stacking组合分类方法的中文情感分类研究[J] . 中文信息学报, 2010, 24(5):56-61.
[9] PANG Bo, LEE L, VAITHYANATHAN S. Thumbs up? Sentiment classification using machine learning techniques[C] //Proc of Confe-rence on Empirical Methods in Natural Language Processing. 2002:79-86.
[10] CUI Hang, MITTAL V, DATAR M. Comparative experiments on sentiment classification for online product reviews[C] //Proc of the 21st National Conference on Artificial Intelligence. 2006:1265-1270.
[11] KOULOUMIS E, WILSON T, MOORE J. Twitter sentiment analysis:the good the bad and the OMG![C] // Proc of the 5th International AAAI Conference on Weblogs and Social Media. 2011:538-541.
[12] TURNEY P. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews[C] // Proc of the 40th Annual Meeting of the Association for Computational Linguistics. 2002:417-424.
[13] SUTTON C, MCCALLUM A. An introduction to conditional random fields for relational learning[M] // Introduction to Statistical Relatio-nal Learning. Cambridge:MIT Press, 2006.
收稿日期 2014/3/5
修回日期 2014/4/20
页码 978-981
中图分类号 TP391.1
文献标志码 A