《计算机应用研究》|Application Research of Computers

基于发文内容的微博用户兴趣挖掘方法研究

Research of microblog user interest mining based on microblog posts

免费全文下载 (已被下载 次)  
获取PDF全文
作者 熊才伟,曹亚男
机构 1.中国科学院信息工程研究所 国家重点工程实验室,北京100093;2.中国科学院大学 计算机与控制学院,北京 100093
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2018)06-1619-05
DOI 10.3969/j.issn.1001-3695.2018.06.004
摘要 针对微博用户兴趣属性缺失问题,提出一种基于发文内容分析的微博用户兴趣挖掘方法。利用基于短语的主题模型和自动构建的用户兴趣知识库,能够有效地从发文内容中挖掘出高质量的用户兴趣短语并标志其类别,从而实现对微博用户的兴趣挖掘。在SMP CUP 2016数据集上的实验结果表明,主题短语模型在困惑度和短语质量上取得的效果均优于传统的主题模型,用户兴趣挖掘的准确率和召回率最高可达到78%和82%。
关键词 微博;发文内容;兴趣挖掘;主题短语模型;知识库
基金项目 国家自然科学基金青年基金资助项目(61403369)
国家科技部重大专项资助项目(2016YFB0801300)
本文URL http://www.arocmag.com/article/01-2018-06-004.html
英文标题 Research of microblog user interest mining based on microblog posts
作者英文名 Xiong Caiwei, Cao Yanan
机构英文名 1.NationalKeyEngineeringLaboratory,InstituteofInformationEngineering,ChineseAcademyofSciences,Beijing100093,China;2.SchoolofComputer&ControlEngineering,UniversityofChineseAcademyofSciences,Beijing100093,China
英文摘要 To abstract missing interests of microblog users, this paper proposed an data mining approach based on posting message analysis.Using the phrase-LDA and the user interest knowledge base constructed automatically, it could extract high-quality candidate interest phrases from posting messages and implement the interest classification.The experimental results on SMP CUP 2016 dataset show that the phrase-LDA can achieve better results than traditional topic model on perplexity and phrase quality.The accuracy rate and the recall rate of user interest mining can reach 78% and 82% at best respectively.
英文关键词 microblog; microblog posts; interests mining; phrase-LDA; knowledge base
参考文献 查看稿件参考文献
  [1] 丁宇新, 肖骁, 吴美晶, 等. 基于半监督学习的社交网络用户属性预测[J] . 通信学报, 2014, 35(8):15-22.
[2] Vu T, Perez V. Interest mining from user Tweets[C] //Proc of the 22nd ACM International Conference on Information & Knowledge Management. New York:ACM Press, 2013:1869-1872.
[3] Tao Yang, Lee D, Su Yan. Steeler NATION, 12th man, and boo birds:classifying Twitter user interests using time series[C] //Proc of IEEE/ACM International Conference on Advances in Social Networks and Mining. New York:ACM Press, 2013:684-691.
[4] He Li, Jia Yan, Han Weihong, et al. Mining user interest in microblogs with a user-topic model[J] . China Communications, 2014, 11(8):131-144.
[5] Mihalcea R, Tarau P. Textrank:bringing order into texts[EB/OL] . (2011-01-31). https://digital. library. unt. edu/ark:/67531/metadc30962/.
[6] Zhao W X, Jiang Jing, Weng Jianshu, et al. Comparing Twitter and traditional media using topic models[C] //Advances in Information Retrieval. Berlin:Springer, 2011:338-349.
[7] 张晨逸, 孙建伶, 丁轶群. 基于MD-LDA模型的微博主题挖掘[J] . 计算机研究与发展, 2011, 48(10):1795-1802.
[8] Salton G, Buckley C. Term-weight approaches in automatic text retrieval[J] . Information Processing and Management, 1988, 24(5):513-523.
[9] Page L, Brin S, Motwani R, et al. The PageRank citation ranking:bringing order to the Web[R] . Palo Alto. CA:Stanford Infolab, 1999:1-17.
[10] Banerjee N, Chakraborty D, Dasgupta K, et al. User interests in social media sites:an exploration with micro-blogs[C] //Proc of the 18th ACM Conference on Information and Knowledge Management. New York:ACM Press, 2009:1823-1826.
[11] Zhang Silong, Luo Junyong, Liu Yan, et al. Hotspots detection on microblog[C] //Proc of the 4th International Conference on Multimedia Information Networking and Security. Washington DC:IEEE Press, 2012:922-925.
[12] Ramage D, Hall D, Nallapati R, et al. Labeled LDA:a supervised topic model for credit attribution in multi-labeled corpora[C] //Proc of Conference on Empirical Methods in Natural Language Processing. Stroudsburg:ACL, 2009:248-256.
[13] Hu Xia, Sun Nan, Zhang Chao, et al. Exploiting internal and external semantics for the clustering of short texts using world knowledge[C] //Proc of the 18th ACM Conference on Information and Knowledge management. New York:ACM Press, 2009:919-928.
[14] Abel F, Gao Qi, Houben G J, et al. Semantic enrichment of twitter posts for user profile construction on the social Web[C] //Proc of the 8th Extended Semantic Web Conference on the Semantic Web:Research and Applications. Berlin:Springer-Verlag, 2011:375-389.
[15] Musat C C, Velcin J, Trausan-Matu S, et al. Improving topic evaluation using conceptual knowledge[C] //Proc of the 22nd International Joint Conference on Artifical Intelligence. San Francisco:AAAI Press, 2011:1866-1871.
[16] 王广新. 基于微博的用户兴趣分析与个性化信息推荐[D] . 上海:上海交通大学, 2013.
[17] 陈文涛, 张小明, 李舟军. 构建微博用户兴趣模型的主题模型的分析[J] . 计算机科学, 2013, 40(4):45-53.
[18] Welch M J, Schonfeld U, He Dan, et al. Topical semantics of twitter links[C] //Proc of the 4th ACM International Conference on Web Search and Data Mining. New York:ACM Press, 2011:327-336.
[19] Ma Yunfei, Zeng Yi, Ren Xu, et al. User interests modeling based on multi-source personal information fusion and semantic reasoning[C] //Lecture Notes in Computer Science, vol 6890. Berlin:Springer, 2011:195-205.
[20] Du Yajun, Hai Yufeng. Semantic ranking of Web pages based on formal concept analysis[J] . Journal of Systems and Software, 2013, 86(1):187-197.
[21] Ramage D, Dumais S, Liebling D. Characterizing microblogs with topic models[C] //Proc of the 4th International Conference on Weblogs & Social Media. Palo Alto, CA:AAAI Press, 2010:130-137.
[22] Hong Liangjie, Davison B D. Empirical study of topic modeling in Twitter[C] //Procs of the 1st Workshop on Social Media Analytics. New York:ACM Press, 2012:80-88.
[23] Weng Jianshu, Lim E P, Jiang Jing, et al. TwitterRank:finding topic sensitive influential twitterers[C] //Proc of the 3rd ACM International Conference on Web Search and Data Mining. New York:ACM Press, 2010:261-270.
收稿日期 2017/1/24
修回日期 2017/3/14
页码 1619-1623
中图分类号 TP301.6
文献标志码 A