《计算机应用研究》|Application Research of Computers

数字标准平台中海量时空小文件合并策略研究

Study on consolidation strategy of small temporal-spatial files in digital standard platform

免费全文下载 (已被下载 次)  
获取PDF全文
作者 顾鑫,熊炼
机构 1.湖北省标准化研究院,武汉 430061;2.武汉大学 测绘遥感信息工程国家重点实验室,武汉 430079
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2014)11-3340-04
DOI 10.3969/j.issn.1001-3695.2014.11.032
摘要 针对HDFS处理时空小文件效率不高的问题,从用户的访问规律和访问数据自身属性这两者之间的相关性上出发,将用户访问流看成对数据文件的请求序列,然后根据数据的时空属性参数化表示,并利用特征提取构建一个新的特征序列,最后通过序列模式挖掘PrefixSpan算法找到用户在不同访问模式下的特征模板,合并相关文件。实验结果表明,该合并策略有效地降低了NameNode内存占用率和响应时间,提高了读取效率。
关键词 数字标准平台;HDFS;小文件;时空数据;序列模式挖掘
基金项目 国家自然科学基金资助项目(61263040,61075015)
本文URL http://www.arocmag.com/article/01-2014-11-032.html
英文标题 Study on consolidation strategy of small temporal-spatial files in digital standard platform
作者英文名 GU Xin, XIONG Lian
机构英文名 1. Hubei Institute of Standardization, Wuhan 430061, China; 2. State Key Laboratory of Information Engineering in Surveying, Mapping & Remote Sensing, Wuhan University, Wuhan 430079, China
英文摘要 Aiming to the issues of low processing efficiency of small files in HDFS, from the perspective of researching correlation between user’s accessing regulation and data attributes, this paper treated user accessing streams as request sequences to data files, and parameterized these data on the basis of its spatial and temporal properties. When it generalized new signature sequences by feature extraction, the feature templates of different access modes were found through sequential pattern mi-ning by PrefixSpan algorithm. Experimental results show that the consolidation strategy effectively reduces the NameNode memory usage and response time, and improves the system read efficiency.
英文关键词 digital standard platform; HDFS; small file; spatio-temporal data; sequential pattern mining
参考文献 查看稿件参考文献
  [1] 李德仁, 李清泉, 谢智颖, 等. 论空间信息与移动通信的集成应用[J] . 武汉大学学报:信息科学版, 2002, 27(1):1-8.
[2] 张丰. 面向网格的海量时空数据访问、集成和互操作研究[D] . 杭州:浙江大学, 2007:17-18.
[3] 马灿, 孟丹, 熊劲, 等. 曙光星云分布式文件系统:海量小文件存取[J] . 小型微型计算机系统, 2012, 33(7):1481-1488.
[4] GHEMAWAT S, GOBIOFF H, LEUNG S T. The Google file system[C] //Proc of the 19th ACM Symposium on Operating Systems Principles. New York:ACM Press, 2003:29-43.
[5] 余思, 桂小林, 黄汝维, 等. 一种提高云存储中小文件存储效率的方案[J] . 西安交通大学学报, 2011, 45(6):59-60.
[6] Amazon. Amazon simple storage service (Amazon-s3)[EB/OL] . (2009). http://www. amazon. com/cn/s3/.
[7] WBITE T. Hadoop权威指南[M] . 周敏奇, 钱卫宁, 金澈清, 等译. 2版. 北京:清华大学出版社, 2011.
[8] Hadoop. Hadoop archives guide[EB/OL] . (2011). http://hadoop. apache. org/docs/r1. 2. 1/hadoop_archives. html.
[9] Hadoop Wiki. Sequencefile[EB/OL] . (2011). http://wiki. ap-ache. org/hadoop/SequenceFile.
[10] VENNER J. Pro Hadoop[M] . New York:Apress, 2009.
[11] JIANG Liu, LI Bing, SONG Mei-na. The optimization of HDFS based on small files[C] //Proc of the 3rd IEEE International Conference on Broadband Network and Multimedia Technology. 2010:912-915.
[12] DONG Bo, ZHENG Qing-hua, TIAN Feng, et al. An optimized approach for storing and accessing small files on cloud storage[J] . Journal of Network and Computer Applications, 2012, 35(6):1847-1862.
[13] 泰冬雪. 基于Hadoop的海量小文件的处理方法的研究[D] . 沈阳:辽宁大学, 2011:28-29.
[14] LIU Xu-hui, HAN Ji-zhong, ZHONG Yun-qin. Implementing WebGIS on Hadoop:a case study of improving small file I/O performance on HDFS[C] //Proc of IEEE International Conference on Cluster Computing and Workshops. 2009:1-8.
[15] DONG Bo, QIU Jie, ZHENG Qing-hua. A novel approach to improving the efficiency of storing and accessing small files on Hadoop:a case study by powerpoint files[C] //Proc of the 7th IEEE Internatio-nal Conference on Services Computing. 2010:65-72.
[16] The apache software foundation. Hadoop[EB/OL] . (2009). http://hadoop. apache. org/core/.
收稿日期 2013/11/15
修回日期 2013/12/26
页码 3340-3343
中图分类号 TP333
文献标志码 A