《计算机应用研究》|Application Research of Computers

TSPT:基于预训练的三阶段复合式文本摘要模型

TSPT:three-stage compound text summarization model based on pre-training

免费全文下载 (已被下载 次)  
获取PDF全文
作者 吕瑞,王涛,曾碧卿,刘相湖
机构 华南师范大学 计算机学院,广州 510631
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2020)10-007-2917-05
DOI 10.19734/j.issn.1001-3695.2019.05.0193
摘要 抽取式方法从源文本中抽取句子,会造成信息冗余;生成式方法可能生成非源文词,会产生语法问题,自然性差。BERT作为一种双向Transformer模型,在自然语言理解任务上展现了优异的性能,但在文本生成任务的应用有待探索。针对以上问题,提出一种基于预训练的三阶段复合式文本摘要模型(TSPT),结合抽取式方法和生成式方法,将源本文经过预训练产生的双向上下文信息词向量由sigmoid函数获取句子得分抽取关键句,在摘要生成阶段将关键句作为完形填空任务重写,生成最终摘要。实验结果表明,该模型在CNN/Daily Mail数据集中取得了良好效果。
关键词 预训练; 复合式文本摘要模型; 抽取式方法; 生成式方法; 双向上下文信息词向量
基金项目 国家自然科学基金资助项目(61772211,61503143)
本文URL http://www.arocmag.com/article/01-2020-10-007.html
英文标题 TSPT:three-stage compound text summarization model based on pre-training
作者英文名 Lyu Rui, Wang Tao, Zeng Biqing, Liu Xianghu
机构英文名 School of Computer,South China Normal University,Guangzhou 510631,China
英文摘要 Extractive method extracts sentences from source text, which results in information redundancy. Generative method may generate non-source words, which results in grammatical problems and poor naturalness. BERT as a bidirectional Transformer model, which shows excellent performance in natural language understanding tasks. The application of text generation task remains to be explored. To address the above issues, this paper proposed a three-stage compound text summarization model based on pre-training(TSPT), which combining extractive method and generative method. The bidirectional contextual information word vectors generated by the pre-training of the source text are extracted from the sigmoid function to obtain sentence scores and extract key sentences. Finally, key sentences are rewritten as cloze filling tasks to generate final summarization. The experimental results show that it achieves high scores in CNN/Daily Mail dataset.
英文关键词 pre-training; compound text summarization model; extractive method; generative method; bidirectional contextual information word vectors
参考文献 查看稿件参考文献
 
收稿日期 2019/5/27
修回日期 2019/7/9
页码 2917-2921
中图分类号 TP391
文献标志码 A