《计算机应用研究》|Application Research of Computers

基于MTS-AdaBoost的不平衡数据分类研究

Classification of unbalanced data based on MTS-AdaBoost

免费全文下载 (已被下载 次)  
获取PDF全文
作者 顾玉萍,程龙生
机构 南京理工大学 经济管理学院,南京 210094
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2018)02-0346-03
DOI 10.3969/j.issn.1001-3695.2018.02.006
摘要 不平衡数据在实际应用中广泛存在,而传统的分类算法大多假定类分布平衡,因此解决不平衡数据的分类问题已经成为数据挖掘的瓶颈问题之一。马田系统(MTS)是一种多元模式识别方法,将其与AdaBoost集成算法相结合,形成MTS-AdaBoost算法。该算法以MTS为基分类器,根据上一个基分类器的预测结果,自行调整下一个基分类器中样本被抽中的概率,以此来改变不同类数据的平衡度。最后,利用该算法对2010—2015年间上市公司的财务危机预警进行实证研究,结果表明,MTS-AdaBoost算法在系统降维和分类效果上都优于传统MTS,也优于其他常用的单一分类器。
关键词 马田系统;AdaBoost集成算法;不平衡数据;财务危机预警;分类
基金项目 国家自然科学基金资助项目(71271114)
本文URL http://www.arocmag.com/article/01-2018-02-006.html
英文标题 Classification of unbalanced data based on MTS-AdaBoost
作者英文名 Gu Yuping, Cheng Longsheng
机构英文名 SchoolofEconomics&Management,NanjingUniversityofScience&Technology,Nanjing210094,China
英文摘要 Unbalanced data are widely used in practical applications, but most of the traditional classification algorithms assume class distribution balance.Therefore, solving the problem of unbalanced data classification has become one of the bottlenecks in data mining.MTS is a multivariate pattern recognition method, which is combined with the AdaBoost integration algorithm to form the MTS-AdaBoost algorithm.The algorithm used the MTS as the base classifier, and adjusted the probability of the sample in the next base classifier according to the prediction result of the previous base classifier, so as to change the ba-lance degree of the different class data.Finally, this paper applied this method to research the financial crisis warning of listed companies from 2010 to 2015.The result shows that MTS-AdaBoost algorithm’s dimensionality reduction and classification results are both superior to traditional MTS, and they are also superior to other commonly used single classifiers.
英文关键词 Mahalanobis-Taguchi system(MTS); AdaBoost integrated algorithm; unbalanced data; financial crisis war-ning; classification
参考文献 查看稿件参考文献
  [1] Chawla N V, Bowyer K W, Hall L O, et al. SMOTE:synthetic minority over-sampling technique[J] . Journal of Artificial Intelligence Research, 2002, 16(1):321-357.
[2] Yen S J, Lee Y S. Cluster-based under-sampling approaches for imbalanced data distributions[J] . Expert Systems with Applications, 2009, 36(3):5718-5727.
[3] Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting[C] //Proc of European Conference on Computational Learning Theory. Berlin:Springer, 1995.
[4] Taguchi G, Jugulum R. The Mahalanobis-Taguchi strategy:a pattern technology system[M] . Hoboken:Wiley, 2002.
[5] Wang Pachun, Su C T, Chen Kunhuang, et al. The application of rough set and Mahalanobis distance to enhance the quality of OSA diagnosis[J] . Expert Systems with Applications, 2011, 38(6):7828-7836.
[6] Jin Xiaohang, Chow T W S. Anomaly detection of cooling fan and fault classification of induction motor using Mahalanobis-Taguchi system[J] . Expert Systems with Applications, 2013, 40(15):5787-5795.
[7] Shakya P, Kulkarni M S, Darpe A K. A novel methodology for online detection of bearing health status for naturally progressing defect[J] . Journal of Sound and Vibration, 2014, 333(21):5614-5629.
[8] Valarmathi B, Palanisamy V. Opinion mining of customer reviews using Mahalanobis-Taguchi system[J] . European Journal of Scientific Research, 2011, 62(1):95-100.
[9] Hadighi A, Mahdavi I. A new model for strategy formulation using Mahalanobis-Taguchi system and clustering algorithm[J] . Intelligent Information Management, 2011, 3(5):198-203.
[10] Lee Y C, Teng H L. Predicting the financial crisis by Mahalanobis-Taguchi system-examples of Taiwan’s electronic sector[J] . Expert Systems with Applications, 2009, 36(4):7469-7478.
[11] Wu Xindong, Kumar V. The top ten algorithms in data mining[M] . [S. l. ] :CRC Press, Taylor & Francis Group, 2009.
[12] Su C T, Hsiao Y H. An evaluation of the robustness of MTS for imbalanced data[J] . IEEE Trans on Knowledge & Data Engineering, 2007, 19(10):1321-1332.
[13] 谢纪刚, 裘正定, 韩彦俊, 等. 上市公司财务困境预测模型比较研究[J] . 系统工程理论与实践, 2005, 25(9):29-35.
收稿日期 2016/10/21
修回日期 2016/11/30
页码 346-348,353
中图分类号 TP391
文献标志码 A