《计算机应用研究》|Application Research of Computers

基于文本聚类与兴趣衰减的微博用户兴趣挖掘方法

Microblog user interest mining based on text clustering and interest decay

免费全文下载 (已被下载 次)  
获取PDF全文
作者 秦永彬,孙玉洁,魏笑
机构 贵州大学 a.计算机科学与技术学院;b.贵州省公共大数据重点实验室,贵阳 550025
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2019)05-039-1469-05
DOI 10.19734/j.issn.1001-3695.2017.11.0743
摘要 结合用户兴趣与微博信息的特点,提出了一种文本聚类与兴趣衰减的微博用户兴趣挖掘(TCID-MUIM)方法。首先通过基于词林的同义词合并策略弥补建模时词频信息不足的弊端;然后利用二次single-pass不完全聚类算法将用户微博划分为多个簇,将簇合并为同一文档以弥补微博文本短小难以挖掘主题信息的问题;最后通过LDA模型建模,并考虑用户兴趣随时间变化的问题,引入时间因子,将微博—主题矩阵压缩为用户—主题矩阵,获取用户兴趣。实验表明,较之传统建模方法与合并用户历史微博为同一文档的建模方法,TCID-MUIM方法挖掘的用户兴趣主题具有更好的主题区分度,且更贴合用户的真实兴趣偏好。
关键词 微博; single-pass聚类; LDA模型; 用户兴趣挖掘; 兴趣衰减
基金项目 国家自然科学基金重大研究计划项目(91746116)
贵州省科技重大专项计划项目(黔科合重大专项字[2017]3002)
本文URL http://www.arocmag.com/article/01-2019-05-039.html
英文标题 Microblog user interest mining based on text clustering and interest decay
作者英文名 Qin Yongbin, Sun Yujie, Wei Xiao
机构英文名 a.College of Computer Science & Technology,b.Guizhou Key Laboratory of Public Big Data,Guizhou University,Guiyang 550025,China
英文摘要 On account of the characteristics of user interest and microblog information, this paper put forward a method of microblog user interest mining based on text clustering and interest decay(TCID-MUIM). Firstly, it used the synonyms combined strategy based on synonym word forest to make up for the process of modeling the lack of word frequency information. Secondly, it used the double single-pass incomplete clustering algorithm to make up the problem that the microblog text was shorter so that difficult to dig the topic information. Finally, it used the LDA model modeling, as well as considering the user's interest changes with time, by introduction of time factor compresses the microblog-topic matrix into the user-topic matrix to gain user interest. Experimental results show that compared to traditional modeling methods and the modeling methods of merger user's all history microblog as the same document, the TCID-MUIM method presented which modeling results have a higher topic's differences and closer to the user's real interest preferences.
英文关键词 microblog; single-pass clustering; LDA model; user interest mining; interest decay
参考文献 查看稿件参考文献
 
收稿日期 2017/11/17
修回日期 2018/1/8
页码 1469-1473
中图分类号 TP301.6;TP391
文献标志码 A