《计算机应用研究》|Application Research of Computers

基于LDA耦合空间模型的作文跑题检测方法研究

Off-topic essay detection based on LDA coupling space

免费全文下载 (已被下载 次)  
获取PDF全文
作者 孟超颖,宋文爱,富丽贞
机构 中北大学 软件学院,太原 030051
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2019)12-005-3544-04
DOI 10.19734/j.issn.1001-3695.2018.08.0590
摘要 传统的跑题检测方法大部分是将文本转换为向量空间的向量表示,再计算与正确文章之间的相似度来得到是否跑题的结果,然而这种方法仅针对文章语句结构上的表示,却忽略了文章语义上的关联,并且对于题目发散度较高的作文跑题检测效果较低。针对以上问题,利用题目与正文主题词在耦合空间计算其相关度,再通过聚类的方法实现无监督的作文跑题检测。实验结果表明,基于耦合空间模型的作文跑题检测方法不论对于题目发散度较低还是较高的作文的检测准确度都有一定程度的提高,其中对于题目发散度较高的作文更为明显。
关键词 作文跑题检测; 耦合空间; 主题词提取; 相关度
基金项目 国家自然科学基金资助项目(61602427)
山西省自然科学基金资助项目(201601D202037)
本文URL http://www.arocmag.com/article/01-2019-12-005.html
英文标题 Off-topic essay detection based on LDA coupling space
作者英文名 Meng Chaoying, Song Wen'ai, Fu Lizhen
机构英文名 School of Software,North University of China,Taiyuan 030051,China
英文摘要 The traditional method of off-topic essay detection was mostly by transforming the text into vector space vector representation, then calculating the similarity between the correct text and the result of the problem. However, this method only took the representation of the sentence structure, yet neglected the semantic relevance of the article. And the off-topic essay detection result of high composition test was low. In view of the above problems, this paper proposed a method that used the topic and the text theme words in the coupling space to calculate its correlation degree, and then through the clustering method to realize the unsupervised composition test. The experimental results show that the composition test method based on the coupled space model has a certain degree of improvement to the essays with high divergence and low divergence, in which the composition of the topic with higher divergence is more obvious.
英文关键词 off-topic essay detection; coupling space; extraction of subject words; relevance
参考文献 查看稿件参考文献
 
收稿日期 2018/8/13
修回日期 2018/9/28
页码 3544-3547
中图分类号 TP391.4
文献标志码 A