《计算机应用研究》|Application Research of Computers

基于filter+wrapper模式的特征选择算法

Feature selection algorithm based on filter+wrapper pattern

免费全文下载 (已被下载 次)  
获取PDF全文
作者 周传华,柳智才,丁敬安,周家亿
机构 1.安徽工业大学 管理科学与工程学院,安徽 马鞍山 243002;2.中国科学技术大学 计算机科学与技术学院,合肥 230026;3.早稻田大学 IPS学院,日本 东京
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2019)07-012-1975-05
DOI 10.19734/j.issn.1001-3695.2018.01.0024
摘要 特征选择是数据挖掘、机器学习和模式识别中始终面临的一个重要问题。针对类和特征分布不均时,传统信息增益在特征选择中存在的选择偏好问题,提出了一种基于信息增益率与随机森林的特征选择算法。该算法结合filter和wrapper模式的优点,首先从信息相关性和分类能力两个方面对特征进行综合度量,然后采用序列前向选择(sequential forward selection,SFS)策略对特征进行选择,并以分类精度作为评价指标对特征子集进行度量,从而获取最优特征子集。实验结果表明,该算法不仅能够达到特征空间降维的效果,而且能够有效提高分类算法的分类性能和查全率。
关键词 信息增益率; 随机森林; 特征选择; filter模式; wrapper模式
基金项目 国家自然科学基金资助项目(71371013,71772002)
安徽省留学人员创新项目择优资助计划(2016)
本文URL http://www.arocmag.com/article/01-2019-07-012.html
英文标题 Feature selection algorithm based on filter+wrapper pattern
作者英文名 Zhou Chuanhua, Liu Zhicai, Ding Jing'an, Zhou Jiayi
机构英文名 1.School of Management Science & Engineering,Anhui University of Technology,Maanshan Anhui 243002,China;2.School of Computer Science & Technology,University of Science & Technology of China,Hefei 230026,China;3.Graduate School of Information,Production & Systems,Waseda University,Tokyo,Japan
英文摘要 Feature selection is one of the most important issues in data mining, machine learning and pattern recognition. Aiming at the problem of preference of traditional information gain algorithm in feature selection when the class and feature are unevenly distributed, this paper proposed a new feature selection algorithm based on information gain ratio and random forest. The proposed algorithm combined with the advantages of filter and wrapper modes. Firstly, it carried out a comprehensive measurement of features from two aspects of information correlation and classification ability. Secondly, it used sequential forward selection(SFS) strategy to select the features, and used the classification accuracy as the evaluation index to measure the feature subset. Finally, it obtained the optimal feature subset. The experimental results show that the proposed algorithm can not only achieve the effect of dimension reduction in feature space, but also effectively improve the classification performance and recall rate of classification algorithm.
英文关键词 information gain ratio; random forest; feature selection; filter mode; wrapper mode
参考文献 查看稿件参考文献
 
收稿日期 2018/1/16
修回日期 2018/3/12
页码 1975-1979,2010
中图分类号 TP312;TP301.6
文献标志码 A