《计算机应用研究》|Application Research of Computers

基于粗糙集的带决策规则边界的邮件过滤算法

E-mail filtering algorithm with boundary decision rules based on rough set

免费全文下载 (已被下载 次)  
获取PDF全文
作者 杨艳燕,郭红转,路新华
机构 1.南阳理工学院 计算机与信息工程学院,河南 南阳 473004;2.郑州大学 信息工程学院,郑州 450002
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2015)01-0258-04
DOI 10.3969/j.issn.1001-3695.2015.01.059
摘要 针对垃圾邮件过滤的准确率和稳定性不高,以及为了解决邮件过滤算法在语料分类上存在漏报和误报等问题,提出基于粗糙集的带决策规则边界的邮件过滤算法(RARM)。该算法运用粗糙集理论对语料库进行直接分析,并采用启发式方法提出了粗糙集理论的三种不同决策规则的执行计划,确保当邮件内容的词汇语义较为模糊时,仍能保证一定的分类准确度。在实验仿真中,通过与基于支持向量机(SVM)、AdaBoost和贝叶斯分类的邮件过滤算法相比较,该算法在垃圾邮件过滤上的准确率优于对比算法。
关键词 邮件过滤;粗糙集;启发式方法;决策规则边界
基金项目 河南省科技攻关项目(122102210563,132102210215)
本文URL http://www.arocmag.com/article/01-2015-01-059.html
英文标题 E-mail filtering algorithm with boundary decision rules based on rough set
作者英文名 YANG Yan-yan, GUO Hong-zhuan, LU Xin-hua
机构英文名 1. School of Computer & Information Engineering, Nanyang Institute of Technology, Nanyang Henan 473004, China; 2. College of Information Engineering, Zhengzhou University, Zhengzhou 450002, China
英文摘要 For accuracy and stability of the spam filter is not high , and in order to solve the problem such as e-mail filtering algorithm has false negatives and false positives on the corpus classification. This paper proposed e-mail filtering algorithm with boundary decision rules based on rough set. First, it used rough set theory for direct analysis of corpus and used heuristic methods to propose three different decision rules of the rough set theory in the execution plan, making sure that when the message content was more blurred at lexical semantics, could still guarantee a certain classification accuracy. In spam classification experiments, this algorithm is compared with SVM, AdaBoost and Bayesian mail filtering algorithm, which better than the comparison algorithm on the accuracy of spam filtering.
英文关键词 spam filtering; rough set; heuristic methods; decision rules boundary
参考文献 查看稿件参考文献
  [1] 刘伍颖, 王挺. 结构化集成学习垃圾邮件过滤[J] . 计算机研究与发展, 2012, 49(3):628-635.
[2] 邓维斌, 王国胤, 洪智勇. 基于粗糙集的加权朴素贝叶斯邮件过滤方法[J] . 计算机科学, 2011, 38(2):218-221.
[3] YEVSEYEVA I, BASTO-FERNANDES V, RUANO-ORDS D. Optimising anti-spam filters with evolutionary algorithms[J] . Expert Systems with Applications, 2013, 40(10):4010-4021.
[4] PREZ-DAZ N, RUANO-ORDS D, FDEZ-RIVEROLA F, et al. SDAI:an integral evaluation methodology for content-based spam filtering models[J] . Expert Systems with Applications, 2012, 39(16):12487-12500.
[5] LI Cheng-hua, HUANG J X. Spam filtering using semantic eimilarity approach and adaptive BPNN[J] . Neurocomputing, 2012, 92:88-97.
[6] LAI G H, CHEN C M, LAIH C S, et al. A collaborative anti-spam system[J] . Expert System with Applications, 2009, 36(3):6645-6653.
[7] CHIU Y F, CHEN C M, JENG B, et al. An alliance-based anti-spam approach[C] //Proc of the 3rd International Conference on Na-tural Computation. 2007:203-207.
[8] KIM J, CHUNG K, CHOI K. Spam filtering with dynamically updated URL statistics[J] . IEEE Security and Privacy, 2007, 5(4):33-39.
[9] CARRERAS X, MRQUEZ L. Boosting trees for anti-spam e-mail filtering[C] //Proc of the 4th International Conference on Recent Advances in Natural Language. 2001:58-64.
收稿日期 2013/11/27
修回日期 2014/1/6
页码 258-261
中图分类号 TP393.098;TP301.6
文献标志码 A