《计算机应用研究》|Application Research of Computers

基于特征选择的极限随机森林算法研究

Research on extreme random forest algorithm based on feature selection

免费全文下载 (已被下载 次)  
获取PDF全文
作者 杨丰瑞,罗思烦,李前洋
机构 1.重庆邮电大学 a.通信与信息工程学院;b.通信新技术应用研究中心,重庆 400065;2.重庆重邮信科(集团)股份有限公司,重庆 401121
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2020)09-012-2625-04
DOI 10.19734/j.issn.1001-3695.2019.04.0122
摘要 高维复杂数据处理是数据挖掘领域中的关键问题,针对现有特征选择分类算法存在的预测精确度失衡、整体分类效率低下等问题,提出了一种结合概率相关性和极限随机森林的特征选择分类算法(P-ERF)。该算法使用充分考虑特征之间相关性与P值结合的特征选择方式,避免了树节点分裂过程中造成的冗余性问题;并以随机树为基分类器、极限随机森林为整体框架,使P-ERF算法获得了更高的精准度和更好的泛化误差。实验结果表明,P-ERF算法相较于随机森林算法、极限随机森林算法,在数据集分类精度与整体性方面均得到良好的效果。
关键词 概率相关性; 特征选择; 特征子集; 极限随机森林
基金项目
本文URL http://www.arocmag.com/article/01-2020-09-012.html
英文标题 Research on extreme random forest algorithm based on feature selection
作者英文名 Yang Fengrui, Luo Sifan, Li Qianyang
机构英文名 1.a.School of Communication & Information Engineering,b.Research Center of New Telecommunication Technology,Chongqing University of Posts & Telecommunications,Chongqing 400065,China;2.Chongqing University of Posts & Telecommunications Information Technology(Group)Co. Ltd. ,Chongqing 401121,China
英文摘要 High-dimensional complex data processing is an important problem in the field of data mining. Aiming at the pro-blems of imbalance of prediction accuracy and low overall classification efficiency of existing feature selection integrated classification algorithms, this paper proposed a feature selection classification algorithm combining probability correlation and extreme random forest. The algorithm used a feature selection method that fully considered the correlation between the features and the P-value, which avoided the redundancy caused by the tree node splitting process; Besides, this algorithm took the random tree as the base classifier and the extreme random forest as the whole framework, which made it achieve higher accuracy and better generalization error. The experimental results show that the algorithm can achieve good results in data set classification accuracy and integrity compared with random forest algorithm and extreme random forest algorithm.
英文关键词 probability correlation; feature selection; feature subset; extreme random forest
参考文献 查看稿件参考文献
 
收稿日期 2019/4/16
修回日期 2019/6/11
页码 2625-2628,2633
中图分类号 TP181
文献标志码 A