《计算机应用研究》|Application Research of Computers

基于多通道深度学习网络的混合语言短文本情感分类方法

Code-switching short-text sentiment classification method based on multi-channel deep learning network

免费全文下载 (已被下载 次)  
获取PDF全文
作者 张洋,胡燕
机构 武汉理工大学 计算机科学与技术学院,武汉 430070
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2021)01-014-0069-06
DOI 10.19734/j.issn.1001-3695.2019.12.0616
摘要 相比于单一语言的短文本情感分类而言,混合语言由于其表达情感的单词语言不唯一,语法结构复杂,仅使用传统词嵌入的方法无法使分类器学到足够有用的特征,导致分类效果不佳。针对这些问题,提出一种融合字词特征的双通道复合模型。首先,针对数据集不平衡问题,提出一种基于Bert语义相似度的数据集欠采样算法;其次,构建双通道深度学习网络,分别将以字、词方式嵌入的原始数据通过两个通道送入CNN和带有注意力机制的LSTM组成的模块中进行多粒度特征提取;最后融合多通道的特征进行分类。在NLPCC2018任务1公布的混合语言五分类数据集上的实验表明,该模型的整体性能较目前有代表性的深度学习模型有进一步提高。
关键词 混合语言短文本; 多通道; 注意力机制; 融合特征
基金项目 湖北省自然科学基金资助项目(2019CFC919)
本文URL http://www.arocmag.com/article/01-2021-01-014.html
英文标题 Code-switching short-text sentiment classification method based on multi-channel deep learning network
作者英文名 Zhang Yang, Hu Yan
机构英文名 School of Computer Science & Technology,Wuhan University of Technology,Wuhan 430070,China
英文摘要 Compared with the single language short-text sentiment classification, the code-switching short-text sentiment classification has more challenges to face up with because the word that expresses emotion is not unique and the sentence has complex grammatical structure, using traditional word embedding alone cannot make the classifier learn enough useful features, resulting in poor classification. This paper proposed a dual-channel deep learning model which integrated char and word features. Firstly, in order to solve the problem of imbalanced data set, it proposed a data undersampling algorithm based on Bert semantic similarity. Secondly, it constructed dual-channel deep learning network, the original data embedded in chars and words were sent to two different module composed of CNN and LSTM with attention mechanism through two channels for extracting multi-level features, and finally features from the two channels were fused for classification. The experimental results show that the overall performance of the proposed model is further improved than the current representative deep learning models on the code-switching five-category dataset published in NLPCC2018&task 1.
英文关键词 code-switching text; multi-channel; attention mechanism; fusion features
参考文献 查看稿件参考文献
 
收稿日期 2019/12/5
修回日期 2020/2/1
页码 69-74
中图分类号 TPN26
文献标志码 A