《计算机应用研究》|Application Research of Computers

基于证据理论的不平衡数据半监督分类方法

Semi-supervised classification method for imbalanced data based on evidence theory

免费全文下载 (已被下载 次)  
获取PDF全文
作者 杜利敏,徐扬
机构 1.西南交通大学 智能控制开发中心,成都 610031;2.河南大学 药学院,河南 开封 475004
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2018)02-0342-04
DOI 10.3969/j.issn.1001-3695.2018.02.005
摘要 针对标记样本少的不平衡数据,提出一种基于证据理论和biased-SVM的不平衡数据半监督分类方法。首先采用随机子空间方法得到不同的视图;然后在各个视图上利用初始的标记样本集训练biased-SVM模型,并将其应用于未标记样本集,从而得到未标记样本的类概率输出;最后引入证据理论进行信息融合来提高标注的稳定性。通过在一些公共数据集上的实验结果表明,与其他方法相比,在不同的标记样本率下,所提方法均具有较高的数据集整体的G-mean值和小类的F-value值,并具有较高的稳定性。
关键词 半监督分类;不平衡数据;证据理论;biased-SVM
基金项目 国家自然科学基金资助项目(61673320)
国家自然科学基金青年科学基金资助项目(61305074)
本文URL http://www.arocmag.com/article/01-2018-02-005.html
英文标题 Semi-supervised classification method for imbalanced data based on evidence theory
作者英文名 Du Limin, Xu Yang
机构英文名 1.IntelligentControlDevelopmentCenter,SouthwestJiaotongUniversity,Chengdu610031,China;2.CollegeofPharmacy,HenanUniversity,KaifengHenan475004,China
英文摘要 This paper proposed a semi-supervised classification method based on evidence theory and biased-SVM for imba-lance data sets which had a number of unlabeled samples.First, the method used the stochastic subspace method to get different views.Second, it trained biased-SVM model using the initial labelled samples on each view, then the trained model was applied to unlabled samples to get probability outputs.At last, it adopted evidence theory to improve the stability of unlabeled samples signatures.Experimental results on some public data sets show that compared with other methods, the proposed approach can more effectively and stably utilize the unlabeled examples to improve the value of G-mean and minority class F-value under the different rate of labelled sample.
英文关键词 semi-supervised classification; imbalanced data; evidence theory; biased-SVM
参考文献 查看稿件参考文献
  [1] Provost F, Fawcett T. Robust classification for imprecise environments[J] . Machine Learning, 2001, 42(3):203-231.
[2] Abe N. An iterative method for multi-class cost-sensitive learning[C] //Proc of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2004:3-11.
[3] Hong Xia, Chen Sheng, Harris C J. A kernel-based two-class classifier for imbalanced data sets[J] . IEEE Trans on Neural Networks, 2007, 18(1):28-41.
[4] Ertekin S. Active learning for class imbalance problem[C] //Proc of International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM Press, 2007:823-824.
[5] Zhou Zhihua, Liu Xuying. On multi-class cost-sensitive learning[J] . Computational Intelligence, 2010, 26(3):232-257.
[6] 陶新民, 张冬雪, 郝思媛, 等. 基于谱聚类欠取样的不均衡数据 SVM 分类算法[J] . 控制与决策, 2012, 27(12):1761-1768.
[7] Zhou Zhihua, Li Ming. Tri-training:exploiting unlabeled data using three classifiers[J] . IEEE Trans on Knowledge and Data Engineering, 2005, 17(11):1529-1541.
[8] Nigam K, Ghani R. Analyzing the effectiveness and applicability of co-training[C] //Proc of the 9th International Conference on Information and Knowledge Management. New York:ACM Press, 2000:86-93.
[9] Cohen I, Cozman F G, Sebe N, et al. Semisupervised learning of classifiers:theory, algorithms, and their application to human-computer interaction[J] . IEEE Trans on Pattern Analysis and Machine Intelligence, 2004, 26(12):1553-1566.
[10] Zhu Xiaojin. Semi-supervised learning literature survey[J] . Computer Science, 2008, 37(1):63-77.
[11] Vapnik V N. Statistical learning theory[M] . New York:Wiley, 1998.
[12] Veropoulos K, Campbell C, Cristianini N. Controlling the sensitivity of support vector machines[C] //Proc of International Joint Conference on Artificial Intelligence. 1999:55-60.
[13] Dempster A P. Upper and lower probabilities induced by a multivalued mapping[J] . The Annals of Mathematical Statistics, 1967, 38(2):325-339.
[14] Shafer G. A mathematical theory of evidence[M] . Princeton:Princeton University Press, 1976.
[15] 姜万录, 吴胜强. 基于SVM和证据理论的多数据融合故障诊断方法[J] . 仪器仪表学报, 2010, 31(8):1738-1743.
[16] 孔志周, 蔡自兴. 基于统计证据的半监督多分类器融合方法[J] . 控制与决策, 2011, 26(11):1616-1620.
[17] Akbani R, Kwek S, Japkowicz N. Applying support vector machines to imbalanced data sets[C] //Lecture Notes in Computer Science, vol 3201. 2004:39-50.
[18] Ho T K. The random subspace method for constructing decision forests[J] . IEEE Trans on Pattern Analysis and Machine Intelligence, 1998, 20(8):832-844.
[19] Su C T, Chen Longsheng, Yih Y. Knowledge acquisition through information granulation for imbalanced data[J] . Expert Systems with Applications, 2006, 31(3):531-541.
[20] Han Hui, Wang Wenyuan, Mao Binghuan. Borderline-SMOTE:a new over-sampling method in imbalanced data sets learning[C] //Proc of International Conference on Intelligent Computing. 2005:878-887.
[21] Asuncion A, Newman D. UCI repository of machine learning databases[DB/OL] . [2009-04-03] . http://www. Ics. uci. edu/~mlearn/MLRep-ository. Html.
收稿日期 2016/10/20
修回日期 2016/12/13
页码 342-345
中图分类号 TP391
文献标志码 A