英文标题 | Unsupervised text simplification with sequence-to-sequence model |
作者英文名 | Li Tianyu, Li Yun, Qian Zhenyu |
机构英文名 | School of Information Engineering,Yangzhou University,Yangzhou Jiangsu 225137,China |
英文摘要 | Training text simplification model based on seq2seq requires large-scale parallel corpora. However, current task lacks large-scale and well-labeled parallel corpora. To address the above issues, this paper proposed an unsupervised text simplification algorithm that made the learning of the model only need simple and complex sentence datasets without labels. First, the method used denoising autoencoder to learn from simple sentence corpus and complex sentence corpus, respectively, to obtain a simple sentence autoencoder and a complex sentence autoencoder. Then, it combined the two autoencoders to form an initial text simplification model and a text complication model. Finally, it used back-translation to convert the unsupervised text simplification problem into a supervised problem, and iteratively optimized the text simplification model. Experiments on the standard dataset show that the method is superior to the existing unsupervised model on the general indicators BLEU and SARI, and the model has simplified effects at both the lexical and syntactic level. |
英文关键词 | text simplification; unsupervised; sequence-to-sequence(seq2seq) model; denoising autoencoder |