《计算机应用研究》|Application Research of Computers

基于矩阵分解和子模最大化的微博新闻摘要方法

Weibo-oriented news summarization based on matrix factorization and submodular maximization

免费全文下载 (已被下载 次)  
获取PDF全文
作者 刘彼洋,孙锐,姬东鸿
机构 武汉大学 计算机学院,武汉 430072
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2017)10-2892-05
DOI 10.3969/j.issn.1001-3695.2017.10.003
摘要 针对面向微博的中文新闻摘要的主要挑战,提出了一种将矩阵分解与子模最大化相结合的新闻自动摘要方法。该方法首先利用正交矩阵分解模型得到新闻文本潜语义向量,解决了短文本信息稀疏问题,并使投影方向近似正交以减少冗余;然后从相关性和多样性等方面评估新闻语句集合,该评估函数由多个单调子模函数和一个评估语句不相似度的非子模函数组成;最后设计贪心算法生成最终摘要。在NLPCC2015数据集上的实验结果表明,该方法能有效提高面向微博的新闻自动摘要质量,ROUGE得分超过其他基线系统。
关键词 子模属性;正交矩阵分解;新闻摘要;抽取式摘要;微博
基金项目 国家社科重大招标计划资助项目(11&ZD189)
国家自然科学基金面上资助项目(61373108)
本文URL http://www.arocmag.com/article/01-2017-10-003.html
英文标题 Weibo-oriented news summarization based on matrix factorization and submodular maximization
作者英文名 Liu Biyang, Sun Rui, Ji Donghong
机构英文名 SchoolofComputer,WuhanUniversity,Wuhan430072,China
英文摘要 This paper presented a novel method for Weibo-oriented Chinese new summarization which combined matrix factorization and submodular maximization. It used the orthogonal matrix factorization(OrMF) model to solve the information sparsity issue of short texts and the information redundancy problem in the projection procedure, and obtained robust latent vectors for news sentences. Moreover, it evaluated news sentences for its relevance and diversity. The objective function included several submodular functions and a non-submodular function that evaluated sentence dissimilarities. Finally, it designed a greedy algorithm to select summary sentences. Experimental results on NLPCC2015 datasets show that the ROUGE scores of the proposed method outweigh other baseline systems and that the quality of Weibo-oriented news summaries is improved effectively.
英文关键词 submodularity; orthogonal matrix factorization; news summarization; extractive summarization; Weibo
参考文献 查看稿件参考文献
  [1] Nenkova A, McKeown K. A survey of text summarization techniques[M] . New York:Springer US, 2012:43-76.
[2] García-Hernández R A, Ledeneva Y. Word sequence models for single text summarization[C] //Proc of the 2nd International Conferences on Advances in Computer-Human Interactions. 2009:44-48.
[3] Wei Kai, Liu Yuzong, Kirchhoff K, et al. Using document summarization techniques for speech data subset selection[C] //Proc of NAACL Conference. 2013:721-726.
[4] Gong Yihong, Liu Xin. Generic text summarization using relevance measure and latent semantic analysis[C] //Proc of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM Press, 2001:19-25.
[5] Davis S T, Conroy J M, Schlesinger J D. OCCAMS:an optimal combinatorial covering algorithm for multi-document summarization[C] //Proc of the 12th International Conference on Data Mining. Washington DC:IEEE Computer Society, 2012:454-463.
[6] Guo Weiwei, Diab M. Modeling sentences in the latent space[C] //Proc of the 50th Annual Meeting of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2012:864-872.
[7] Guo Weiwei, Liu Wei, Diab M T. Fast tweet retrieval with compact binary codes[C] //Proc of the 25th International Conference on Computational Linguistics. 2014:486-496.
[8] McDonald R. A study of global inference algorithms in multi-document summarization[C] //Proc of the 29th the European Conference on Information Retrieval. Berlin:Springer, 2007:557-564.
[9] Lin Hui, Bilmes J. A class of submodular functions for document summarization[C] //Proc of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg:Association for Computational Linguistics, 2011:510-520.
[10] Lin Hui, Bilmes J. Multi-document summarization via budgeted maximization of submodular functions[C] //Proc of Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2010:912-920.
[11] Lin Hui, Bilmes J, Xie Shasha. Graph-based submodular selection for extractive summarization[C] //Proc of IEEE Workshop on Automatic Speech Recognition & Understanding. 2009:381-386.
[12] Li Jingxuan, Li Lei, Li Tao. Multi-document summarization via submodularity[J] . Applied Intelligence, 2012, 37(3):420-430.
[13] Sipos R, Shivaswamy P, Joachims T. Large-margin learning of sub-modular summarization models[C] //Proc of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2012:224-233.
[14] Morita H, Sasano R, Takamura H, et al. Subtree extractive summarization via submodular maximization[C] //Proc of ACL. 2013:1023-1032.
[15] Dasgupta A, Kumar R, Ravi S. Summarization through submodularity and dispersion[C] //Proc of the 51st Meeting on Association for Computational Linguistics. 2013:1014-1022.
[16] Vigneshwaran L J K P M, Sharma M V V D M. Non-decreasing sub-modular function for comprehensible summarization[C] //Proc of the 15th Annual Conference of the North American Chapter of the Association for Compututional Linguistics. 2016:94-101.
[17] Wang Lu, Raghavan H, Cardie C, et al. Query-focused opinion summarization for user-generated content[C] //Proc of International Conference on Computational Linguistics. 2014:1660-1669.
[18] Svore K M, Vanderwende L, Burges C J C. Enhancing single-document summarization by combining RankNet and third-party sources[C] //Proc of EMNLP-CoNLL Conference. 2007:448-457.
[19] Kastner I, Monz C. Automatic single-document key fact extraction from newswire articles[C] //Proc of the 12th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2009:415-423.
[20] Chen Kuanyu, Liu Shihhung, Chen B, et al. Extractive broadcast news summarization leveraging recurrent neural network language modeling techniques[J] . IEEE/ACM Trans on Audio, Speech, and Language Processing, 2015, 23(8):1322-1334.
[21] Wang J H, Yang J Y. Statistical single-document summarization for Chinese news articles[C] //Proc of the 26th International Conference on Advanced Information Networking and Applications. Washington DC:IEEE Computer Society, 2012:183-188.
[22] 莫鹏, 胡珀, 黄湘冀, 等. 基于超图的文本摘要与关键词协同抽取研究[J] . 中文信息学报, 2015, 29(6):135-140.
[23] Borodin A, Lee H C, Ye Yuli. Max-sum diversification, monotone submodular functions and dynamic updates[C] //Proc of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. New York:ACM Press, 2012:155-166.
收稿日期 2016/7/12
修回日期 2016/8/31
页码 2892-2896,2928
中图分类号 TP391.1
文献标志码 A