《计算机应用研究》|Application Research of Computers

单词统计特性在情感词自动抽取和商品评论分类中的作用

Using lexical statistical features in extracting sentimental words and classifying product reviews

免费全文下载 (已被下载 次)  
获取PDF全文
作者 韩彤晖,杨东强,马宏伟
机构 山东建筑大学 计算机科学与技术学院,济南 250100
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2019)03-044-0866-07
DOI 10.19734/j.issn.1001-3695.2017.09.0913
摘要 单词的统计特征在自然语言处理中具有广泛应用。针对统计特征对关键词抽取和文本分类精确度的影响,分析了八种常见的统计特征,通过情感词抽取和商品评论分类,研究统计特征在情感分析领域中的作用。利用八种统计特征构造文本向量空间模型,替代基于单词构造文本向量空间模型的方法,能够降低文本向量的维度,具有隐形语义空间(LSA/SVD)的压缩效果,在保证分类结果准确率的前提下有效降低了算法的复杂度,能够替代传统的向量空间模型。情感词提取实验的结果表明,通过结合统计特征与词性,情感词提取的准确率能够达到76.4%,显著高于基于统计特征或单词词性的情感词提取算法;商品评论分类的测试结果表明,与传统的基于单词的文本情感分类相比,基于统计特征的商品评论分类的准确率提高了10.8%。
关键词 统计特征;情感词提取;商品评论分类
基金项目 国家教育部人文社会科学研究一般项目基金资助项目(15YJA740054)
本文URL http://www.arocmag.com/article/01-2019-03-044.html
英文标题 Using lexical statistical features in extracting sentimental words and classifying product reviews
作者英文名 Han Tonghui, Yang Dongqiang, Ma Hongwei
机构英文名 SchoolofComputerScience&Technology,ShandongJianzhuUniversity,Jinan250100,China
英文摘要 The statistical features of words are widely used in natural language processing.This paper summarized eight types of statistical features, and studied the role of these features in extracting sentimental words and classifying product reviews.Different from the multi-dimensions of lexical elements in the vector space models(VSM), this paper only employed these 8 types of statistical features in representation of words or documents, which had the ability that could lower the VSM’s dimension and could effectively derive the latent semantic space without expensive time and space complexity of SVD calculation.Sentiment words extraction result show that combining these statistical features and PoS tags of words can achieve much higher extraction accuracy than other methods with precision of 76.4%.Product reviews classification results show that in contrast with sentimental words in constructing the feature space, exclusively using these 8 kinds of statistical features can improve classification precision by 10.8%.
英文关键词 statistical features; extracting sentimental words; classifying product reviews
参考文献 查看稿件参考文献
  [1] Tang Duyu, Wei Furu, Qin Bing, et al. Building large-scale Twitter-specific sentiment lexicon:a representation learning approach[C] // Proc of the 25th International Conference on Computational Linguistics. 2014:172-182.
[2] Ibrahim H S, Sherif M A, Gheith M H. Sentiment analysis for modern standard Arabic and colloquial[J] . International Journal on Natural Language Computing, 2015, 4(2):95-109.
[3] Wang Feixiang, Zhang Zhihua, Lan Man. ECNU at SemEval-2016 task 7:an enhanced supervised learning method for lexicon sentiment intensity ranking[C] // Proc of International Workshop on Semantic Evaluation. Stroudsburg, PA:Association for Computational Linguistics, 2016:491-496.
[4] Mohammad S M, Bravo-Marquez F. WASSA-2017 shared task on emotion intensity[C] // Proc of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 2017.
[5] Qiu Guang, Liu Bing, Bu Jiajun, et al. Opinion word expansion and target extraction through double propagation[J] . Computational Linguistics, 2011, 37(1):9-27.
[6] Liu Kang, Xu Liheng, Zhao Jun. Extracting opinion targets and opinion words from online reviews with graph co-ranking[C] // Proc of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics, 2014:314-324.
[7] Chetviorkin I, Loukachevitch N. Domex:extraction of sentiment lexicons for domains and meta-domains[C] //Proc of the 24th International Conference on Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics, 2012:77-86.
[8] Jovanoski D, Pachovski V, Nakov P. On the impact of seed words on sentiment polarity lexicon induction[C] //Proc of the 26th International Conference on Computational Linguistics. 2016:1557-1567.
[9] Severyn A, Moschitti A. On the automatic learning of sentiment lexicons[C] // Proc of Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics, 2015:1397-1402.
[10] Yu Hongliang, Deng Zhihong, Li Shiyingxue. Identifying sentiment words using an optimization-based model without seedwords[C] // Proc of the 51st Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics, 2013:855-859.
[11] Rajeswari K, Nakil S, Patil N, et al. Text categorization optimization by a hybrid approach using multiple feature selection and feature extraction methods[J] . International Journal of Engineering Research and Applications, 2014, 4(5):86-90.
[12] Uysal A K. An improved global feature selection scheme for text classification[J] . Expert Systems With Applications, 2016, 43(1):82-92.
[13] McAuley J, Leskovec J. Hidden factors and hidden topics:understanding rating dimensions with review text[C] // Proc of the 7th ACM Conference on Recommender Systems. New York:ACM Press, 2013:165-172. [14] Chen Zhiyuan, Mukherjee A, Liu Bing. Aspect extraction with automated prior knowledge learning[C] //Proc of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics, 2014, 347-358.
[15] Mesleh A A. CHI square feature extraction based svms arabic language text categorization system[J] . Journal of Computer Science, 2007, 3(6):430-435.
[16] Mitra P, Murthy C A, Pal S K. Unsupervised feature selection using feature similarity[J] . IEEE Trans on Pattern Analysis and Machine Intelligence, 2002, 24(3):301-312.
[17] Juola P, Baayen H. A controlled-corpus experiment in authorship identification by cross-entropy[J] . Literary and Linguistic Computing, 2005, 20(Suppl 1):59-67.
[18] Wang Yong, Witten I. Inducing model trees for continuous classes[C] // Proc of the 9th European Conference on Machine Learning. 1997:128-137.
[19] Zhou Xinjie, Wan Xiaojun, Xiao Jianguo. Collective opinion target extraction in Chinese microblogs[C] //Proc of Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2013:1840-1850.
[20] Bakliwal A, Arora P, Patil A, et al. Towards enhanced opinion classification using NLP techniques[C] // Proc of Workshop on Sentiment Analysis where AI Meets. 2011:101-107.
[21] Yoo J Y, Yang Dongmin. Classification scheme of unstructured text document using TF-IDF and nave Bayes classifier[J] . Advanced Science and Technology Letters, 2015, 111(50):263-266.
[22] Chen Hao, Zhan Yan, Li Yan. The application of decision tree in Chinese email classification[C] // Proc of the 9th International Conference on machine Learning and Cybernetics. Piscataway, NJ:IEEE Press, 2010:305-308.
[23] Zhang Minlong, Zhou Zhihua. Multi-label neural networks with applications to functional genomics and text categorization[J] . IEEE Trans on Knowledge and Data Engineering, 2006, 18(10):1338-1351.
[24] Moreira S, Filgueiras J, Martins B, et al. Reaction:a naive machine learning approach for sentiment classification[C] //Proc of the 2nd Joint Conference on Lexical and Computational Semantics. Stroudsburg, PA:Association for Computational Linguistics, 2013:490-494.
[25] Pang Bo, Lee L, Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques[C] // Proc of ACL Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2002:79-86.
收稿日期 2017/9/7
修回日期 2017/10/23
页码 866-872
中图分类号 TP391
文献标志码 A