《计算机应用研究》|Application Research of Computers

不确定时间序列的相似性匹配研究

Similarity matching for uncertain time series

免费全文下载 (已被下载 次)  
获取PDF全文
作者 宋转,廖小飞,肖瑞
机构 东华大学 a.信息科学与技术学院;b.计算机科学与技术学院,上海 201620
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2014)11-3349-04
DOI 10.3969/j.issn.1001-3695.2014.11.034
摘要 确定时间序列的相似性匹配方法都没有考虑数据的不确定性,而现实世界中诸如温度传感器等设备采集到的数据往往是不确定的,并且两条不确定时间序列之间的距离也是不确定的,所以现有的确定时间序列的相似性匹配方法不适用于这些领域。针对此问题,提出了基于统计学的规约算法,并且基于该算法提出了不确定时间序列相似性匹配的两种新型算法。在规约过程中,规约算法优化了不同背景下不确定时间序列的小概率点和奇异点的处理。在匹配过程中,首先提出了圆环匹配算法,它通过构建匹配圆环完成相似性匹配,并且通过多次重启提高相似性匹配的准确度和效率;然后在规约算法的基础上,提出了期望匹配的改进算法,它通过增加包络约束消除期望匹配算法中出现的误判问题。
关键词 时间序列;不确定性;匹配;降维;欧氏距离
基金项目 国家自然科学基金资助项目(71171045,61201249)
国家教育部博士点基金资助项目(20120075120007)
本文URL http://www.arocmag.com/article/01-2014-11-034.html
英文标题 Similarity matching for uncertain time series
作者英文名 SONG Zhuan, LIAO Xiao-fei, XIAO Rui
机构英文名 a. College of Information Science & Technology, b. College of Computer Science & Technology, Donghua University, Shanghai 201620, China
英文摘要 Similarity matching techniques for certain time series did not consider the uncertainty of the data, but the data collected by the sensors were often not certain in the real world. So, the existed similarity matching methods of time series did not apply to these areas. To solve this problem, this paper put forward a reduction algorithm based on statistics and improved the Euclidean distance calculation. Then, it raised new similarity matching algorithms. In the process of reduction, it optimized the treatment of small probability points and singular points in every slot. In the process of circle similarity matching, it improved the accuracy and decreased the time cost by restarting many times. At last, it put forward the improved algorithm by expectancy calculation based on reduction algorithm and solved the miscarriage of justice problem.
英文关键词 time series; uncertainty; matching; reduction; Euclidean distance
参考文献 查看稿件参考文献
  [1] RENZ M, CHENG R, KRIEGEL H P, et al. Similarity search and mining in uncertain databases[J] . Proceedings of the VLDB Endowment, 2010, 3(1-2):1653-1654.
[2] CHENG R, KALASHNIKOV D V, PRABHAKAR S. Evaluating probabilistic queries over imprecise data[C] //Proc of ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2003:551-562.
[3] RE C, DALVI N, SUCIU D. Efficient top-k query evaluation on probabilistic data[C] //Proc of the 23rd International Conference on Data Engineering. 2007:886-895.
[4] HUA Ming, PEI Jian, ZHANG Wen-jie, et al. Ranking queries on uncertain data:a probabilistic threshold approach[C] //Proc of ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2008:673-686.
[5] SARMA A D, BENJELLOUN O, HALEVY A, et al. Representing uncertain data:models, properties, and algorithms[J] . The VLDB Journal, 2009, 18(5):989-1019.
[6] NGAI W K, KAO Ben, CHUI C K, et al. Efficient clustering of uncertain data[C] //Proc of the 6th IEEE International Conference on Data Mining. 2006:436-445.
[7] SATHE S, JEUNG H, ABERER K. Creating probabilistic databases from imprecise time-series data[C] //Proc of the 27th International Conference on Data Engineering. 2011:327-338.
[8] BERNECKER T, EMRICH T, KRIEGEL H, et al. A novel probabilistic pruning approach to speed up similarity queries in uncertain databases[C] //Proc of the 27th International Conference on Data Engineering. 2011:339-350.
[9] AGRAWAL R, FALOUTSOS C, SWAMI A. Efficient similarity search in sequence databases[C] //Proc of the 4th International Conference on Foundations of Data Organization and Algorithms. Berlin:Springer, 1993:69-84.
[10] LIAN Xiang, CHEN Lei, YU J X. Pattern matching over cloaked time series[C] //Proc of the 24th International Conference on Data Engineering. 2008:1462-1464.
[11] YEH M Y, WU K L, YU P S, et al. PROUD:a probabilistic approach to processing similarity queries over uncertain data streams[C] //Proc of the 12th International Conference on Extending Database Technology. 2009:684-695.
[12] ABFALG J, KRIEGEL H, KRGER P, et al. Probabilistic similarity search for uncertain time series[C] //Proc of the 21st International Conference on Scientific and Statistical Database Management. 2009:435-443.
[13] SARANGI S R, MURTHY K. DUST:a generalized notion of simila-rity between uncertain time series[C] //Proc of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mi-ning. New York:ACM Press, 2010:383-392.
[14] ZHAO Yu-chen, AGGARWAL C C, YU P S. On wavelet decomposition of uncertain time series data sets[C] //Proc of the 19th ACM Conference on Information and Knowledge Management. New York:ACM Press, 2010:129-138.
[15] CORMODE G, GAROFALAKIS M. Histograms and wavelets on proba-bilistic data[J] . IEEE Trans on Knowledge and Data Enginee-ring, 2010, 22(8):1142- 1157.
收稿日期 2013/11/20
修回日期 2013/12/30
页码 3349-3352
中图分类号 TP311
文献标志码 A