《计算机应用研究》|Application Research of Computers

基于深度学习的中文微博作者身份识别研究

Research on author identity recognition of Chinese microblog based on deep learning

免费全文下载 (已被下载 次)  
获取PDF全文
作者 徐晓霖,蔡满春,芦天亮
机构 中国人民公安大学 信息技术与网络安全学院,北京 102623
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2020)01-003-0016-03
DOI 10.19734/j.issn.1001-3695.2018.05.0486
摘要 作者身份识别一直在公安行业和文检工作中起着重要的作用。现有的作者语言风格建模过程繁琐、文本特征工程没有普适性。针对此问题,在无须专家进行特征建模的情况下,提出CABLSTM中文微博作者身份识别模型,并在公开微博语料集测试该模型准确度。该模型为最大化提取短文本特征,融合attention机制于CNN中并去除池化层,通过双向LSTM以获取上下文相关信息,身份识别结果通过softmax层进行输出。实验结果表明,该模型在进行中文微博作者身份识别任务中与传统机器学习算法、TextCNN和LSTM算法相对比,在准确率、召回率、<i>F</i>值方面都有一定的提升。
关键词 作者身份识别; 长短时记忆网络; 卷积神经网络; 特征自动提取
基金项目 国家重点研发计划重点专项资助项目(2017YFB0802804)
国家自然科学基金资助项目(61602489)
中国人民公安大学2018年基本科研业务费科研机构项目(2018JKF504)
本文URL http://www.arocmag.com/article/01-2020-01-003.html
英文标题 Research on author identity recognition of Chinese microblog based on deep learning
作者英文名 Xu Xiaolin, Cai Manchun, Lu Tianliang
机构英文名 School of Information Technology & Network Security,People's Public Security University of China,Beijing 102623,China
英文摘要 Author identification always plays an important role in the public security and literary inspection work. Texts feature extraction is cumbersome and not universal. To solve this problem, this paper proposed the CABLSTM Chinese microblog author identification model without expert feature modeling, and tested the accuracy of the model in the open microblog corpus. This model maximized the extraction of short text features, fused the attention mechanism in the CNN and removed the pooling layer, and obtained context-related information through the bidirectional LSTM. The identity recognition result was output through the softmax layer. Experimental results show that the model has a certain improvement in accuracy, recall rate, and <i>F</i>-measure in comparison with traditional machine learning algorithms and TextCNN and LSTM algorithms in the identification task of Chinese microblog authors.
英文关键词 author identification; LSTM; CNN; automatic feature extraction
参考文献 查看稿件参考文献
 
收稿日期 2018/5/29
修回日期 2018/7/11
页码 16-18,25
中图分类号 TP391.72
文献标志码 A