《计算机应用研究》|Application Research of Computers

一种基于遗传算法优化的大数据特征选择方法

Using genetic algorithm for feature selection optimization on big data processing

免费全文下载 (已被下载 次)  
获取PDF全文
作者 张文杰,蒋烈辉
机构 1.解放军信息工程大学 网络空间安全学院,郑州 450001;2.数字工程与先进计算国家重点实验室,郑州 450001
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2020)01-010-0050-03
DOI 10.19734/j.issn.1001-3695.2018.05.0495
摘要 提出了一种基于遗传算法的大数据特征选择算法。该算法首先对各维度的特征进行评估,根据每个特征在同类最近邻和异类最近邻上的差异度调整其权重,基于特征权重引导遗传算法的搜索,以提升算法的搜索能力和获取特征的准确性;然后结合特征权重计算特征的适应度,以适应度作为评价指标,启动遗传算法获取最优的特征子集,并最终实现高效准确的大数据特征选择。通过实验分析发现,该算法能够有效减小分类特征数,并提升特征分类准确率。
关键词 大数据; 特征选择; 遗传算法; 特征子集
基金项目 河南省基础前沿课题
河南省科技攻关计划项目
本文URL http://www.arocmag.com/article/01-2020-01-010.html
英文标题 Using genetic algorithm for feature selection optimization on big data processing
作者英文名 Zhang Wenjie, Jiang Liehui
机构英文名 1.Faculty of Cyberspace Security,PLA Information Engineering University,Zhengzhou 450001,China;2.State Key Laboratory of Mathematical Engineering & Advanced Computing,Zhengzhou 450001,China
英文摘要 This paper proposed a novel feature selection method based on genetic algorithm for big data processing. Firstly, this method evaluated the features of each dimension, adjusted its weight according to the difference of each feature on the si-milar nearest neighbor and the heterogeneous nearest neighbor, and guided the search of genetic algorithm based on the feature weight, thus improved the search ability of the algorithm and the accuracy of feature acquisition. And then it combined the feature weights to calculate the fitness of the feature, took fitness as the evaluation index, and started the genetic algorithm to obtain the optimal feature subset, finally achieved an efficient and accurate big data feature selection. The results of experiment show that this method can effectively reduce the number of classification features and improve the accuracy of feature classification.
英文关键词 big data; feature selection; genetic algorithm; feature subset
参考文献 查看稿件参考文献
 
收稿日期 2018/5/6
修回日期 2018/6/28
页码 50-52,56
中图分类号 TP391
文献标志码 A