《计算机应用研究》|Application Research of Computers

面向微博热点事件的话题检测及表述方法研究

Research on topic detection and expression method for Weibo hot events

免费全文下载 (已被下载 次)  
获取PDF全文
作者 周炜翔,张仰森,张良
机构 北京信息科技大学 智能信息处理研究所,北京 100101
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2019)12-009-3565-05
DOI 10.19734/j.issn.1001-3695.2018.08.0601
摘要 针对微博文本数据稀疏导致热点话题难以检测的问题,提出了一种基于IDLDA-ITextRank的话题检测模型。首先,通过引入微博时间序列特征和词频特征,构建了IDLDA话题文本聚类模型,利用该模型将同一话题的文本聚到一个文本集合TS;然后,通过采用编辑距离和字向量相结合的相似度计算方法,构建了ITextRank文本摘要和关键词抽取模型,对文本集合TS抽取摘要及其关键词;最后,利用词语互信息和左右信息熵将所抽取的关键词转换成关键主题短语,再将关键主题短语和摘要相结合对话题内容进行表述。通过实验表明,IDLDA模型相较于传统的BTM和LDA模型对话题文本的聚类效果更好,利用关键主题短语和摘要对微博的话题进行表述,比直接利用主题词进行话题表述具有更好的可理解性。
关键词 文本聚类; IDLDA-ITextRank模型; 话题抽取; 话题表述
基金项目 国家自然科学基金资助项目(61772081)
本文URL http://www.arocmag.com/article/01-2019-12-009.html
英文标题 Research on topic detection and expression method for Weibo hot events
作者英文名 Zhou Weixiang, Zhang Yangsen, Zhang Liang
机构英文名 Institute of Intelligent Information Processing,Beijing Information Science & Technology University,Beijing 100101,China
英文摘要 Aiming at the problem that Weibo text data sparseness is difficult to detect hot topics, this paper proposed a topic detection model based on IDLDA-ITextRank. Firstly, this paper constructed an IDLDA topic text clustering model by introducing the Weibo time series features and word frequency features, and used the model to cluster the text of the same topic into a text set(TS). Secondly, by using the similarity calculation method which combining editing distance and word vector, it constructed the ITextRank text summary and keyword extraction model to extract the summaries and keywords of the TS. Finally, it used the mutual information of words and left-right information entropy to convert the extracted keywords into key topic phrases, and combined the key topic phrases and summaries to express the topic content. Experiments show that IDLDA model has better clustering effect on topic text than the traditional BTM model and LDA model. The key topic phrases and summaries are better than keywords to express and understand the topics of Weibo.
英文关键词 text clustering; IDLDA-ITextRank model; topic extraction; topic expression
参考文献 查看稿件参考文献
 
收稿日期 2018/8/26
修回日期 2018/10/9
页码 3565-3569,3578
中图分类号 TP391.1
文献标志码 A