《计算机应用研究》|Application Research of Computers

基于超图的汉越双语新闻话题要素提取

Extraction of news topic elements for Chinese Vietnamese bilingual based on hypergraph

免费全文下载 (已被下载 次)  
获取PDF全文
作者 涂子令,周枫,余正涛,严馨,洪旭东
机构 昆明理工大学 信息工程与自动化学院,昆明 650500
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2017)08-2278-04
DOI 10.3969/j.issn.1001-3695.2017.08.008
摘要 针对汉越双语新闻话题文本集合中新闻话题要素提取的问题进行了研究,在超图模型的基础上,运用了PageRank随机游走排序方法。首先根据触发词激励的方法提取新闻中的事件要素;然后在此基础上构建话题超图模型,将汉越事件要素作为节点,将文本集合中的句子作为超边,根据概率评估函数计算节点和超边的初始权重;最后采用PageRank随机游走方法对汉越事件要素进行评分,最终得到汉越话题要素。实验结果表明,该方法相比只考虑单文本事件要素提取方法的效果有显著提高,说明了基于超图的PageRank方法提取新闻话题要素的准确性。
关键词 汉越双语;事件要素;超图;随机游走;话题要素
基金项目 国家自然科学基金资助项目(61562049)
本文URL http://www.arocmag.com/article/01-2017-08-008.html
英文标题 Extraction of news topic elements for Chinese Vietnamese bilingual based on hypergraph
作者英文名 Tu Ziling, Zhou Feng, Yu Zhengtao, Yan Xin, Hong Xudong
机构英文名 SchoolofInformationEngineering&Automation,KunmingUniversityofScience&Technology,Kunming650500,China
英文摘要 This paper studied the problems of news topic elements in the Chinese and Vietnamese bilingual news topic text collections.Based on hypergraph model extracted, it used the PageRank random walk ordering method.First according to the trigger word incentive method, it extracted the news event elements, and then on the basis of this, it constructed topic hypergraph model. It took the Chinese and Vietnamese elements as nodes and the sentences of text collection as a hyper-edge, it calculated the initial weights of nodes and hyperedges according to probability evaluation function. Finally, it used the PageRank random walk method to score the elements of the Chinese-Vietnamese event, and finally obtained the elements of the Chinese-Vietnamese topic.Results show that the proposed method can significantly improve the extraction performance compared to the method only considered single text event feature extraction. It shows the accuracy of extraction of news topicby PageRank me-thod based on hypergraph elements.
英文关键词 Chinese and Vietnamese; event elements; hypergraph; random walk; topic elements
参考文献 查看稿件参考文献
  [1] 梁晗, 陈群秀, 吴平博. 基于事件框架的信息抽取系统[J] . 中文信息学报, 2006, 20(2):40-46.
[2] 冯礼. 基于事件框架的突发事件信息抽取[D] . 上海:上海交通大学, 2008.
[3] Chieu H L, Ng H T. A maximum entropy approach to information extraction from semi-structured and free text[C] //Proc of the 18th National Conference on Artificial Intelligence. 2002:786-791.
[4] Ahn D. The stages of event extraction[C] //Proc of Workshop on Annotating & Reasoning About Time & Events. 2006:1-8.
[5] Ji Heng, Grishman R. Refining event extraction through unsupervised cross-document inference[C] // Proc of the 46th Annual Meeting of the Association for Computational Linguistics. 2008:254-262.
[6] 赵妍妍, 秦兵, 车万翔, 等. 中文事件抽取技术研究[J] . 中文信息学报, 2008, 22(1):3-8.
[7] 张先飞, 郭志刚, 刘嵩, 等. 基于触发词指导的自相似度聚类事件检测[J] . 计算机科学, 2010, 37(3):212-214.
[8] 潘清清, 周枫, 余正涛, 等. 基于条件随机场的越南语命名实体识别方法[J] . 山东大学学报:理学版, 2014, 49(1):76-79.
[9] ACE(automatic content exraction) Chinese annotation guidelines for events[R] . [S. l. ] :National Institute of Standards and Technology, 2005.
[10] Wang Wei, Li Sujian, Li Jiwei, et al. Exploring hypergraph-base semi-supervised ranking for query-oriented summarization[J] . Information Sciences, 2013, 237(13):271-286.
[11] Bellaachia A, Al-Dhelaan M. Multi-document hyperedge-based ran-king for text summarization[C] //Proc of ACM International Conference on Information and Knowledge Management. New York:ACM Press, 2014:1919-1922.
[12] Chen P, Xie H, Maslov S, et al. Finding scientific gems with Google’s PageRank algorithm[J] . Journal of Informetrics, 2007, 1(1):8-15.
收稿日期 2016/5/23
修回日期 2016/7/11
页码 2278-2281
中图分类号 TP391.1
文献标志码 A