《计算机应用研究》|Application Research of Computers

依赖距离主导的向量化方法研究

Vectorization method research dominated by dependence distance

免费全文下载 (已被下载 次)  
获取PDF全文
作者 丁丽丽,韩林,王冬,张素平,王鹏翔,于海宁
机构 信息工程大学 数学工程与先进计算国家重点实验室,郑州 450001
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2017)05-1311-05
DOI 10.3969/j.issn.1001-3695.2017.05.007
摘要 向量寄存器的非满载使用方式为大量迭代次数不足的循环提供了向量化的机会,但也导致向量化的并行宽度不固定,传统的向量因子主导的依赖测试方法不再适用。提出了一种依赖距离主导的依赖测试方法,通过分析依赖图中所有依赖环的破环关键边所携带的依赖距离,选择其中最小的依赖距离来决定并行宽度,破除依赖环,实现基于向量寄存器非满载使用方式的向量化。实验结果表明,该方法能够有效增加循环向量化的机会和提高向量寄存器的使用率,测试用例的向量化加速比平均提高14.6%。
关键词 依赖测试;依赖距离;向量因子;并行宽度;向量化;向量寄存器
基金项目 “核高基”国家科技重大专项资助项目(2009ZX01036-001-001-2)
本文URL http://www.arocmag.com/article/01-2017-05-007.html
英文标题 Vectorization method research dominated by dependence distance
作者英文名 Ding Lili, Han Lin, Wang Dong, Zhang Suping, Wang Pengxiang, Yu Haining
机构英文名 StateKeyLaboratoryofMathematicalEngineering&AdvancedComputing,InformationEngineeringUniversity,Zhengzhou450001,China
英文摘要 Though the usage of the non-full loaded method of the vector register provided an opportunity to vectorize the plenty number of loops which were lacking in iteration numbers, however, it also caused problem as the instability of inter-iteration parallelism of the vector, which made the traditional dependent test methods dominated by vector factor no longer applicable. In order to solve this problem, this paper presented a dependent test method dominated by dependence distance. During its procedure, this method chose the smallest dependence distance by analyzing the dependence distance carried by critical edge which could break dependence circle in the dependency graph, thus determined the inter-iteration parallelism and broke the dependence circle, in order to achieve the vectorization based on the usage of the non-full loaded method of the vector register. Experimental results demonstrate that this method can effectively promote the opportunities of vectoring loops and take full advantage of non-full length usage of vector register, it improves the performance of programs 14.6% in average.
英文关键词 data dependence testing; dependence distance; vector factor; inter-iteration parallelism; vectorization; vector register
参考文献 查看稿件参考文献
  [1] 高伟, 赵荣彩, 韩林, 等. SIMD自动向量化编译优化概述[J] . 软件学报, 2015, 26(6):1265-1284. [2] Allen R, Kennedy K. Optimizing compilers for modern architectures[M] . San Francisco:Morgan Kaufmann Publisher, 2001:35-55. [3] Bulic P, Gustin V. Fast dependence analysis in a multimedia vectorizing compiler[C] //Proc of the 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing. Washington DC:IEEE Computer Society, 2004:176-183. [4] 徐金龙, 赵荣彩, 赵博. SIMD向量指令的非满载使用方法研究[J] . 计算机科学, 2015, 42(7):229-233. [5] Hofmann J, Treibig J, Hager G, et al. Comparing the performance of different x86 SIMD instruction sets for a medical imaging application on modern multi-and manycore chips[C] //Proc of Workshop on Programming Models for SIMD/Vector Processing. New York:ACM Press, 2014:57-64. [6] Nuzman D, Zaks A. Outer-loop vectorization:revisited for short SIMD architectures[C] //Proc of the 17th International Conference on Parallel Architectures & Compilation Techniques. New York:ACM Press, 2008:2-11. [7] Trifunovic K, Nuzman D, Cohen A, et al. Polyhedral-model guided loop-nest auto-vectorization[C] //Proc of the 18th International Conference on Parallel Architectures & Compilation Techniques. Washington DC:IEEE Computer Society, 2009:327-337. [8] Stock K, Louis-Nol P, Sadayappan P. Using machine learning to improve automatic vectorization[J] . ACM Trans on Architecture & Code Optimization, 2012, 8(4):73-94. [9] Kong M, Veras R, Stock K, et al. When polyhedral transformations meet SIMD code generation[J] . ACM SIGPLAN Notices, 2013, 48(6):127-138. [10] 徐金龙, 赵荣彩, 丁锐. 面向循环的混合向量化方法研究[J] . 小型微型计算机系统, 2014, 35(12):2764-2769. [11] 刘鹏, 赵荣彩, 李朋远. 一种面向向量化的动态指针别名分析框架[J] . 计算机科学, 2015, 42(3):26-30. [12] Shin J. SIMD programming by expansion[C] // Proc of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. 2007. [13] Xin Naijun, Chen Xucan, Sun Hanyan, et al. Extending the vector instruction set for high-performance DSP matrix based on GCC[J] . Computer Engineering & Science, 2012, 34(1):57-63.
收稿日期 2016/3/19
修回日期 2016/5/1
页码 1311-1315
中图分类号 TP302.7
文献标志码 A