《计算机应用研究》|Application Research of Computers

面向跨语言文本分类与标签推荐的带标签双语主题模型的研究

Research on labeled bilingual topic model for cross-lingual text classification and label recommendation

免费全文下载 (已被下载 次)  
获取PDF全文
作者 田明杰,崔荣一
机构 延边大学 计算机科学与技术学科 智能信息处理研究室,吉林 延吉 133002
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2019)10-006-2911-05
DOI 10.19734/j.issn.1001-3695.2018.04.0216
摘要 针对日渐丰富的跨语言的文字信息资源与新闻报道及科技文献中的多标签数据,为了挖掘跨语言间的相关性及数据属性间的关联性,提出了带标签双语主题模型,应用于跨语言文本分类与标签的推荐。首先,假设科技文献中的关键词与摘要部分有着内容上的相关性,对关键词进行提取,并进行标签化,进而把标签对应于主题模型中的主题,实例化“潜在”的主题;其次,利用带标签双语主题模型对摘要部分进行了训练迭代;最后,对新加入的文档进行跨语言文本分类及标签的推荐。实验结果表明,跨语言文本分类任务中micro-<i>F</i><sub>1</sub>达到94.81%,推荐的标签也能较好地体现出语义上的相关性。
关键词 主题模型; 标签; 跨语言文本分类; 标签推荐; 潜在主题
基金项目 国家语委“十二五”科研规划项目(YB125-178)
延边大学外国语言文学世界一流学科建设科研项目(18YLPY13)
本文URL http://www.arocmag.com/article/01-2019-10-006.html
英文标题 Research on labeled bilingual topic model for cross-lingual text classification and label recommendation
作者英文名 Tian Mingjie, Cui Rongyi
机构英文名 Intelligent Information Processing Laboratory,Dept. of Computer Science & Technology,Yanbian University,Yanji Jilin 133002,China
英文摘要 Aiming at the increasingly rich multi language information resources and multi-label data in news reports and scientific literatures, in order to mining the relevance between languages and the correlation between data, this paper proposed labeled bilingual topic model, which was applied on cross-lingual text classification and label recommendation. First of all, it could assume that the keywords in the scientific literature are relevant to the abstract in same article. And then it extracted the keywords and regarded it as labels, and aligned the labels with topics in topic model, instantiated the "latent" topic. Secondly, this paper trained the abstracts in article through the proposed topic model. Finally, it classified the new documents by cross-lingual text classifier, and also recommended the labels. The experiment result show that micro-<i>F</i><sub>1</sub> measure reaches 94.81% in cross-lingual text classification task, and the recommended labels also reflects the sematic relevance with documents.
英文关键词 topic model; label; cross-lingual text classification; label recommendation; latent topic
参考文献 查看稿件参考文献
 
收稿日期 2018/4/3
修回日期 2018/5/7
页码 2911-2915
中图分类号 TP391
文献标志码 A