《计算机应用研究》|Application Research of Computers

利用博客链接平台选取联合关键字的博客聚类方法

Blog clustering method based on selection of joint keywords using blog connect platform

免费全文下载 (已被下载 次)  
获取PDF全文
作者 王琦,霍纬纲
机构 1.运城学院 计算机科学与技术系,山西 运城 044000;2.中国民航大学 计算机科学与技术学院,天津 300300
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2017)12-3560-04
DOI 10.3969/j.issn.1001-3695.2017.12.009
摘要 针对全文本关键字检索的时间成本高、采用标签/类别会产生语句歧义和同义词等问题,提出在博客链接平台上选取联合关键字进行博客聚类。假设一个博客文章被查询的候选关键字(或者联合关键字)可以用于表示这个博客文章的主题,为验证该假设,首先将跟踪代码嵌入到博客链接(BC)组件中,以收集读者查询的关键字;然后,选取适当的候选关键字作为联合关键字;最后,使用重叠投影、交互信息投影、分布式分布信息和肯德尔τ系数这四种相似性度量以验证BC组件提取的联合关键字。实验结果表明,提出的方法可以为查询者提供一条找到对应博客的快速通道;此外,生成的联合关键字可以减少全文本关键字检索过程的复杂度和冗余度,很好地满足了博客用户的需求。
关键词 关键字提取;博客链接平台;博客聚类;联合关键字;相似性度量
基金项目 国家自然科学青年基金资助项目(61301245)
本文URL http://www.arocmag.com/article/01-2017-12-009.html
英文标题 Blog clustering method based on selection of joint keywords using blog connect platform
作者英文名 Wang Qi, Huo Weigang
机构英文名 1.Dept.ofComputerScience&Technology,YunchengUniversity,YunchengShanxi044000,China;2.CollegeofComputerScience&Technology,CivilAviationUniversityofChina,Tianjin300300,China
英文摘要 Concerning that the time cost of full-text keyword searching is high, and the label / category statement will produce ambiguity and synonyms problems, this paper proposed a way to select joint keywords in the blog connect platform for blog clustering. This method assumed that the candidate keywords (or joint keyword) of a blog post by querying could be used to represent the theme of this blog. In order to verify this assumption, firstly, it embedded a tracing code in blog connect so as to collect the keywords queried by readers. Then, it used FKRP to select candidate keywords as co-keywords. Finally, it used the similarity measures, including overlapping projection, mutual information projection, distributed information and the Kendall τ coefficient to validate the BC component extraction. The experimental results show that the proposed method can provide a fast channel for the query to find the corresponding blog. In addition, the joint key generation can reduce the search process’s complexity and redundancy, which can well meet the needs of blog users.
英文关键词 keyword selection; blog connect platform; blog clustering; joint keyword; similarity measures
参考文献 查看稿件参考文献
  [1] 曹冬林, 廖祥文, 许洪波, 等. 基于网页格式信息量的博客文章和评论抽取模型[J] . 软件学报, 2009, 20(5):1282-1291.
[2] Chen Yun, Tsai F S, Chan K L. Machine learning techniques for business blog search and mining[J] . Expert Systems with Applications, 2008, 35(3):581-590.
[3] 赵长宽, 李封, 徐彬, 等. 博客好友互动行为相似性研究[J] . 计算机工程与应用, 2013, 37(2):105-109.
[4] Parisa T H, Razeghi B, Okati N, et al. Collective wisdom based blog clustering[C] //Proc of International Conference on Computing, Communication and Networking Technologies. 2015:1-6.
[5] 张超, 陈利, 李琼. 一种PST_LDA中文文本相似度计算方法[J] . 计算机应用研究, 2016, 33(2):375-377, 383.
[6] 闫瑞. 博客数据特征提取与基于分类的垃圾博客过滤[D] . 合肥:中国科学技术大学, 2009.
[7] Lai V, Rajashekar C, Rand W. Comparing social tags to microblogs[C] //Proc of the 3rd International Conference on Social Computing. 2011:1380-1383.
[8] Wang Fei, Wu Yunfang. Mining market trend from blog titles based on lexical semantic similarity[C] //Proc of International Conference on Computational Linguistics and Intelligent Text Processing. Berlin:Springer, 2012:261-273.
[9] 张云中, 张丛昱. 专家分类法·大众分类法和本体的融合架构与演进策略[J] . 图书情报工作, 2015, 23(2):146-151.
[10] 王雅琳, 陆向艳, 钟诚. 基于链接和萤火虫算法聚类博文发现热点话题[J] . 计算机工程与设计, 2015, 38(6):1620-1625.
[11] Parisa T H, Razeghi B, Okati N, et al. Collective wisdom based blog clustering[C] //Proc of the 6th International Conference on Computing, Communication and Networking Technologies. 2015:1-6.
[12] Bross J, Schilf P, Jenders M, et al. Visualizing the Blogosphere with BlogConnect[C] //Proc of the 3rd International Conference on Social Computing. Berlin:Springer, 2011:651-656.
[13] 李庆诚, 彭洁, 宫晓利, 等. 嵌入式HTML文档解析器的设计与实现[J] . 计算机工程, 2009, 35(9):258-260.
[14] Skylar C, Urszula T, Sesselja O, et al. Phylum-specific regulation of resistomycin production in a streptomyces sp. via microbial coculture[J] . Journal of Natural Products, 2014, 78(3):163-168.
[15] 易灵芝, 刘智磊, 龙辛. 基于互信息冗余性分析的神经网络风电功率预测[J] . 湘潭大学自然科学学报, 2016, 38(2):68-72.
[16] 陈媛螈, 刘正捷. 移动情境感知及其交互研究[J] . 计算机应用研究, 2011, 28(12):4420-4425.
收稿日期 2016/11/17
修回日期 2017/1/16
页码 3560-3563,3588
中图分类号 TP391
文献标志码 A