《计算机应用研究》|Application Research of Computers

融合主题特征的文本自动摘要方法研究

Research on automatic text summarization combining topic feature

免费全文下载 (已被下载 次)  
获取PDF全文
作者 罗芳,汪竞航,何道森,蒲秋梅
机构 1.武汉理工大学 计算机科学与技术学院,武汉 430063;2.香港恒生大学 供应链及资讯管理系,香港 999077;3.中央民族大学 信息工程学院,北京 100081
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2021)01-026-0129-05
DOI 10.19734/j.issn.1001-3695.2019.09.0590
摘要 针对传统图模型方法进行文本摘要时只考虑统计特征或浅层次语义特征,缺乏对深层次主题语义特征的挖掘与利用,提出了融合主题特征后多维度度量的文本自动摘要方法MDSR(multi-dimension summarization rank)。首先利用LDA主题模型对文本主题语义信息进行挖掘,定义了主题重要度以衡量主题特征对句子重要程度的影响;然后结合主题特征、统计特征和句间相似度,改进了图模型节点的概率转移矩阵的构建方式;最后根据句子节点权重进行摘要的抽取与度量。实验结果显示,当主题特征、统计特征及句间相似度权重比例达到3:4:3时,MDSR方法的ROUGE评测值达到最佳,ROUGE-1、ROUGE-2、ROUGE-SU4值分别达到53.35%、35.18%和33.86%,优于对比方法,表明了融入主题特征后的文本摘要方法有效提高了摘要抽取的准确性。
关键词 TextRank; 文本摘要; 语义特征; 主题模型; 概率转移矩阵
基金项目 国家教育部人文社会科学研究规划基金资助项目(18YJAZH087)
武汉理工大学自主创新研究基金资助项目(3120600100)
本文URL http://www.arocmag.com/article/01-2021-01-026.html
英文标题 Research on automatic text summarization combining topic feature
作者英文名 Luo Fang, Wang Jinghang, He Daosen, Pu Qiumei
机构英文名 1.School of Computer Science & Technology,Wuhan University of Technology,Wuhan 430063,China;2.Dept. of Supply Chain & Information Management,Hang Seng University of Hong Kong,Hong Kong 999077,China;3.School of Information Engineering,Minzu University of China,Beijing 100081,China
英文摘要 Aiming at the traditional graph models for text summarization only focus on statistical features or shallow semantic features, and lack mining and utilization of deep topic semantic features, this paper proposed MDSR(multi-dimension summarization rank), an automatic text summarization method that combined topic feature. Specifically, this method adopted the LDA model to mine the semantic information of text topics and measured the impact of topic feature on a sentence by defining the importance of the topic. And it improved the construction mode of the probability transition matrix of graph model nodes by combining the topic feature with statistic features and inter-sentence similarity. Finally, it extracted and measured summarization according to the weight of sentence nodes. The results show that the ROUGE value evaluates by MDSR reaches the best when the weight ratio of topic feature, statistic feature and inter-sentence similarity is 3: 4: 3. The ROUGE-1, ROUGE-2, ROUGE-SU4 are 53.35%, 35.18% and 33.86%, which perform better than other comparisons. It shows that the text summarization method combining topic feature can effectively improve the accuracy of the summarization extraction.
英文关键词 TextRank; text summarization; semantic features; LDA; probability transition matrix
参考文献 查看稿件参考文献
 
收稿日期 2019/9/30
修回日期 2019/11/25
页码 129-133
中图分类号 TP391
文献标志码 A