《计算机应用研究》|Application Research of Computers

一种基于SLP的新型编译框架

New framework based on SLP

免费全文下载 (已被下载 次)  
获取PDF全文
作者 张素平,王冬,丁丽丽,王鹏翔,宫一,于海宁
机构 解放军信息工程大学 数学工程与先进计算国家重点实验室,郑州 450001
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2017)01-0021-06
DOI 10.3969/j.issn.1001-3695.2017.01.004
摘要 对于SLP(superword level parallel)算法不能高效处理并行代码占有率较小的大型应用程序的问题,提出并评估了一种新型的基于改进的SLP算法的编译框架。它主要包括三个阶段:将代码中结构相似的异构语句通过改进的SLP算法尽可能地改为同构语句;用全局的观点,在优化目标代码之前获取其数据模型重用;联合数据布局优化进行进一步的性能提升。针对框架做了大量实验,实验结果表明该框架比SLP算法性能更佳,性能提高约15.3%。
关键词 超字并行;同构;超字重用;数据布局
基金项目 “核高基”国家科技重大专项资助项目(2009ZX01036-001-001-2)
本文URL http://www.arocmag.com/article/01-2017-01-004.html
英文标题 New framework based on SLP
作者英文名 Zhang Suping, Wang Dong, Ding Lili, Wang Pengxiang, Gong Yi, Yu Haining
机构英文名 StateKeyLaboratoryofMathematicalEngineering&AdvancedComputing,PLAInformationEngineeringUniversity,Zhengzhou450001,China
英文摘要 Since the SLP(superword level parallel) algorithm could not efficiently deal with the large-scale applications which covered few parallel codes.This paper proposed and evaluated a new compile framework based on the improved SLP algorithm.It contained three phases.First, it tried to transform the non-isomorphic but similar instruction sequences to isomorphic instruction sequences by the improved algorithm as far as possible.Second, it took a global point of view of the target application when capturing the superwords reuse patterns before making the optimization decisions.Eventually, it combined data layout optimization for further performance improvement.This paper did much experiment on the framework.The experimental results indicates that the optimization of the compile framework is better than SLP algorithm, the performance increases about 15.3%.
英文关键词 superword parallel; isomorphic; superword reuse; data layout
参考文献 查看稿件参考文献
  [1] Larsen S, Amarasinghe S. Exploiting superword level parallelism with multimedia instruction sets[J] . ACM SIGPLAN Notices, 2000, 35(5):145-156.
[2] Porpodas V, Magni A, Jones M T. PSLP:padded SLP automatic vectorization[C] //Proc of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization. Washington DC:IEEE Computer Society, 2015:190-201.
[3] 高伟, 赵荣彩, 韩林, 等. SIMD自动向量化编译优化概述[J] . 软件学报, 2015, 26(6):1265-1284.
[4] Leiβa R, Hack S, Wald I. Extending a C-like language for portable SIMD programming[J] . ACM SIGPLAN Notices, 2012, 47(8):65-74.
[5] Vasilache N, Meister B, Baskaran M, et al. Joint scheduling and layout optimization to enable multi-level vectorization[C] //Proc of International Workshop on Polyhedral Compilation Techniques. 2012.
[6] Kong M, Pouchet N, Sadayappan P. Abstract vector SIMD code generation using the polyhedral model, 4/13-TR08[R] . Columbus:Ohio State University, 2013.
[7] Ren Bin, Agrawal G, Larus J R, et al. Fine-grained parallel traversals of irregular data structures[C] //Proc of the 21st International Conference on Parallel Architectures and Compilation Techniques. New York:ACM Press, 2012:461-462.
[8] Barik R, Zhao Jisheng, Sarkar V. Efficient selection of vector instructions using dynamic programming[C] //Proc of IEEE/ACM International Symposium on Microarchitecture. 2010:201-212.
[9] Park Y, Seo S, Park H, et al. SIMD defragmenter:efficient ILP realization on data-parallel architectures[J] . ACM SIGARCH Computer Architecture News, 2012, 40(1):363-374.
[10] Shin J, Chame J, Hall M W. Compiler-controlled caching in superword register files for multimedia extension architectures[C] //Proc of International Conference on Parallel Architectures & Compilation Techniques. Washington DC:IEEE Computer Society, 2002:45-55.
[11] Shin J, Chame J, Hall M W. Exploiting superword-level locality in multimedia extension architectures[J] . Journal of Instruction Level Parallelism, 2003, 5:1-28.
[12] Shin J. Compiler optimizations for architectures supporting superword-level parallelism[D] . Los Angeles:University of California, 2005.
[13] Shin J, Hall M, Chame J. Superword-level parallelism in the presence of control flow[C] //Proc of International Symposium on Code Generation and Optimization. 2005:165-175.
[14] Tenllado C, Piuel L, Prieto M, et al. Pack transposition:enhancing superword level parallelism exploitation[C] //Proc of International Conference on Parallel Computing:Current & Future Issues of High-End Computing. 2008:573-580.
[15] Tenllado C, Prieto M, Tirado F, et al. Improving superword level parallelism support in modern compilers[C] //Proc of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. New York:ACM Press, 2005:303-308.
[16] Nuzman D, Rosen I, Zaks A. Auto-vectorization of interleaved data for SIMD[J] . ACM SIGPLAN Notices, 2006, 41(6):132-143.
[17] Nuzman D, Zaks A. Outer-loop vectorization:revisited for short SIMD architectures[C] //Proc of the 17th International Conference on Parallel Architectures & Compilation Techniques. New York:ACM Press, 2008:2-11.
[18] Liu Jun, Zhang Yuanrui, Jang O, et al. A compiler framework for extracting superword level parallelism[C] //Proc of the 33rd Conference on Programming Language Design and Implementation. New York:ACM Press, 2012:347-357.
[19] Wolfe M. High performance compilers for parallel computing[M] . [S. l. ] :Pearson, 1995.
[20] SPEC2006[EB/OL] . [2011-09-07] . http://www. spec. org/cpu2006/.
[21] NAS parallel benchmark[EB/OL] . http://www. nas. nasa. gov/Resources/Software/npb. html.
收稿日期 2016/1/6
修回日期 2016/5/19
页码 21-26
中图分类号 TP314
文献标志码 A