《计算机应用研究》|Application Research of Computers

基于分化距离的离群点检测算法

Differentiation distance-based outliers detection algorithm

免费全文下载 (已被下载 次)  
获取PDF全文
作者 刘欢,吴介军,苏锦旗
机构 西北工业大学 自动化学院,西安 710072
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2010)09-3316-03
DOI 10.3969/j.issn.1001-3695.2010.09.030
摘要 为了满足大规模数据集快速离群点检测的需要,提出了一种基于分化距离的离群点检测算法,该算法综合考虑了数据对象周围的密度及数据对象间的距离等因素对离群点的影响,通过比较每一对象与其他对象的分化距离来计算其周围的友邻点密度,挖掘出数据集中隐含的离群点。实验表明,该算法能有效地识别离群点,同时能反映出数据对象在数据集中的孤立程度。算法的复杂度较低,适用于大规模数据集快速离群点检测。
关键词 离群点检测;分化距离;分化度;友邻点
基金项目 国家自然科学基金资助项目(10702065);陕西省自然科学基金资助项目(2005F45)
本文URL http://www.arocmag.com/article/1001-3695(2010)09-3316-03.html
英文标题 Differentiation distance-based outliers detection algorithm
作者英文名 LIU Huan, WU Jie-jun, SU Jin-qi
机构英文名 College of Automation, Northwestern Polytechnical University, Xi'an 710072, China
英文摘要 In order to meet the need of rapid outlier detection for large-scale data sets, this paper proposerd a differentiation distance-based outlier detection algorithm(DODA), which took into account the factors that affected outlier, such as the density of the surrounding data objects and the distance between the objects. By comparing differentiation distance of the each object and other objects to calculate the density of its surrounding neighboring points to discover the hidden outliers data set. Experimental results show that: the algorithm can effectively identify outliers, at the same time, data objects reflect the isolation level in the data set. The algorithm’s complexity is low, it is suitable for quickly outlier detection of large data sets.
英文关键词 outlier detection; differentiation distance; differentiation degree; neighboring point
参考文献 查看稿件参考文献
 
收稿日期
修回日期
页码 3316-3318
中图分类号
文献标志码 A