《计算机应用研究》|Application Research of Computers

基于增量词集频率的文本主题词提取算法研究

Research of thematic terms extraction algorithm from Chinese text based on increment term set frequency

免费全文下载 (已被下载 次)  
获取PDF全文
作者 刘兴林,彭宏,马千里
机构 1.华南理工大学 计算机科学与工程学院,广州 510640;2.五邑大学 计算机学院,广东 江门 529020
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2010)09-3237-02
DOI 10.3969/j.issn.1001-3695.2010.09.008
摘要 提出了一种基于增量词集频率的文本主题词提取算法,其核心思想是计算主题词集频率增量,算法从候选主题词集提取主题词时,计算单个候选主题词对主题词集频率的增量,若增量小于给定阈值,则主题词提取算法结束,否则将该候选主题词加入主题词集,继续考察下一个候选主题词。实验结果表明,该算法取得了较好的效果,所获得的主题词能更贴切地反映文章的主要内容。
关键词 增量词集频率;主题词;自然语言处理
基金项目 广东省自然科学基金资助项目(07006474,9451064101003233);广东省科技攻关资助项目(2007B010200044)
本文URL http://www.arocmag.com/article/1001-3695(2010)09-3237-02.html
英文标题 Research of thematic terms extraction algorithm from Chinese text based on increment term set frequency
作者英文名 LIU Xing-lin, PENG Hong, MA Qian-li
机构英文名 1. School of Computer Science & Engineering, South China University of Technology, Guangzhou 510640, China; 2. School of Computer Science, Wuyi University, Jiangmen Guangdong 529020, China
英文摘要 This paper presented an algorithm of thematic terms extraction based on increment term set frequency, the main idea was to calculated the increment frequency of term set, when this algorithm got a term from candidate thematic term set, calculated the increment frequency, if the increment was less than a given threshold, then ended, otherwise, added the candidate thematic term into thematic term set, then next term.Experimental results show that this algorithm achieves sound effects, the thematic terms that acquires by this algorithm can more aptly reflect the main contents of the article.
英文关键词 increment term set frequency; thematic term; NLP
参考文献 查看稿件参考文献
 
收稿日期
修回日期
页码 3237-3238
中图分类号
文献标志码 A