《计算机应用研究》|Application Research of Computers

基于word2vec的跨领域情感分类方法

Cross-domain sentiment classification based on word2vec

免费全文下载 (已被下载 次)  
获取PDF全文
作者 王勤勤,张玉红,李培培,胡学钢
机构 合肥工业大学 计算机与信息学院,合肥 230009
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2018)10-2924-04
DOI 10.3969/j.issn.1001-3695.2018.10.010
摘要 情感分类是用于判断数据的情感极性,广泛用于商品评论、微博话题等数据。标记信息的昂贵使得传统的情感分类方法难以对不同领域的数据进行有效的分类。为此,跨领域情感分类问题引起广泛关注。已有的跨领域情感分类方法大多以共现为基础提取词汇特征和句法特征,而忽略了词语间的语义关系。基于此,提出了基于word2vec的跨领域情感分类方法WEEF(cross-domain classification based on word embedding extension feature),选取高质量的领域共现特征作为桥梁,并以这些特征作为种子,基于词向量的相似度计算,将领域专有特征扩充到这些种子中,形成特征簇,从而减小领域间的差异。在SRAA和Amazon产品评论数据集上的实验结果表明了方法的有效性,尤其在数据量较大时。
关键词 语义特征;共现特征;词向量;跨领域情感分类
基金项目 国家重点研发计划资助项目(2016YFC0801406)
国家自然科学基金资助项目(61673152,61503112)
本文URL http://www.arocmag.com/article/01-2018-10-010.html
英文标题 Cross-domain sentiment classification based on word2vec
作者英文名 Wang Qinqin, Zhang Yuhong, Li Peipei, Hu Xuegang
机构英文名 SchoolofComputerScience&InformationEngineering,HefeiUniversityofTechnology,Hefei230009,China
英文摘要 Sentiment classification aims to judge the sentiment polarity of review holders, which is popularly and widely applied in commodity comments and weibo topics etc. Due to the expensive cost in the labeling, the issue of cross-domain sentiment classification attracts more attention recently. However, most of cross-domain sentiment classification methods extract lexical features and syntactic characteristics based on the co-occurrence relationship, which ignore the semantic information among words. Motivated by this, this paper proposed a feature extension approach based on word embedding in word2vec, called WEEF, for cross-domain sentiment classification. It first selected high-quality domain-independent features as bridge, and used these features as the seeds. Second, it expanded domain-specific features to the seeds based on the similarity of word embedding, and generated the feature-clusters, which was beneficial to reduce the divergence between domain-specific words in different domains. Finally, experimental results conducte on SRAA and Amazon product reviews datasets show the effectiveness of the proposed approach especially in large scale of data sets.
英文关键词 semantic characteristics; co-occurrence characteristics; word vector; cross-domain sentiment classification
参考文献 查看稿件参考文献
  [1] Liu Bing. Sentiment analysis and opinion mining[J] . Synthesis Lectures on Human Language Technologies, 2012, 5(1):1-167.
[2] 樊养余, 李祖贺, 王凤琴, 等. 基于跨领域卷积稀疏自动编码器的抽象图像情绪性分类[J] . 电子与信息学报, 2017, 39(1):167-175.
[3] Dai Wenyuan, Xue Guirong, Yang Qiang, et al. Co-clustering based classification for out-of-domain documents[C] //Proc of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2007:210-219.
[4] Daumé III H. Frustratingly easy domain adaptation[EB/OL] . (2009-07-10). https://arxiv. org/abs/0907. 1815v1.
[5] Daumé III H, Marcu D. Domain adaptation for statistical classifiers[J] . Journal of Artificial Intelligence Research, 2006, 26(1):101-126.
[6] Blitzer J, McDonald R, Pereira F. Domain adaptation with structural correspondence learning[C] //Proc of Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2006:120-128.
[7] Blitzer J, Dredze M, Pereira F. Biographies, bollywood, boom-boxes and blenders:domain adaptation for sentiment classification[C] //Proc of the 45th Annual Meeting of the Association for Computational Linguistics. 2007:440-447.
[8] Pan S J, Ni Xiaochuan, Sun Jiantao, et al. Cross-domain sentiment classification via spectral feature alignment[C] //Proc of the 19th International Conference on World Wide Web. New York:ACM Press, 2010:751-760.
[9] Li Lianghao, JinXiaoming, Long Mingsheng. Topic correlation analysis for cross-domain text classification[C] //Proc of the 26th AAAI-Conference onArtificial Intelligence. Toronto:AAAI Press, 2012:998-1004.
[10] Gao Sheng, Li Haizhou. A cross-domain adaptation method for sentiment classification using probabilistic latent analysis[C] //Proc of the 20th ACM International Conference on Information and Knowledge Management. New York:ACM Press, 2011:1047-1052.
[11] 张博, 史忠植, 赵晓非, 等. 一种基于跨领域典型相关性分析的迁移学习方法[J] . 计算机学报, 2015, 38(7):1326-1336.
[12] Zhang Yuhong, Hu Xuegang, Li Peipei, et al. Cross-domain sentiment classification-feature divergence, polarity divergence or both?[J] . Pattern Recognition Letters, 2015, 65(11):44-50.
[13] Mikolov T, Sutskever I, Chen Kai, et al. Distributed representations of words and phrases and their compositionality[C] //Advances in Neural Information Processing Systems. 2013:3111-3119.
[14] Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model[J] . Journal of Machine Learning Research, 2003, 3(6):1137-1155.
[15] Mnih A, Hinton G. Three new graphical models for statistical language modelling[C] //Proc of the 24th International Conference on Machine Learning. New York:ACM Press, 2007:641-648.
[16] Lilleberg J, Zhu Yun, Zhang Yanqing. Support vector machines and word2vec for text classification with semantic features[C] //Proc of the 14th International Conference on Cognitive Informatics & Cognitive Computing. Piscataway, NJ:IEEE Press, 2015:136-140.
[17] Zhang Dongwen, Xu Hua, Su Zengcai, et al. Chinese comments sentiment classification based on word2vec and SVMperf[J] . Expert Systems with Applications, 2015, 42(4):1857-1863.
收稿日期 2017/5/9
修回日期 2017/6/20
页码 2924-2927
中图分类号 TP181
文献标志码 A