《计算机应用研究》|Application Research of Computers

基于投机执行的两级退休机制

Two-level retirement mechanism based on speculative execution

免费全文下载 (已被下载 次)  
获取PDF全文
作者 段凌霄,孟建熠,李晓明
机构 浙江大学 a.电气工程学院;b.信息与电子工程学系,杭州 310027
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2015)04-1032-04
DOI 10.3969/j.issn.1001-3695.2015.04.017
摘要 针对超标量处理器中指令长时间占用重排序缓存引起指令退休缓慢的问题,提出了一种基于投机执行的两级退休机制。该方案根据指令有无异常和预测错误风险将指令分为有风险指令和无风险指令,对重排序缓存进行轻量化改进,只有存在异常和预测风险的指令才允许进重排序缓存,在确认风险消除后将指令快速退休。重命名寄存器从重排序缓存分离,负责寄存器重命名和结果乱序回写。实验结果表明,在硬件资源相同的情况下,基于该方案的处理器比传统的按序退休处理器的性能平均提高28.8%以上。
关键词 投机执行;重排序缓存;快速退休;乱序回写;超标量
基金项目 国家科技重大专项核高基重大专项资助项目(2009ZX01030-001-002)
本文URL http://www.arocmag.com/article/01-2015-04-017.html
英文标题 Two-level retirement mechanism based on speculative execution
作者英文名 DUAN Ling-xiao, MENG Jian-yi, LI Xiao-ming
机构英文名 a. College of Electrical Engineering, b. Dept. of Information Science & Electronic Engineering, Zhejiang University, Hangzhou 310027, China
英文摘要 In high-performance superscalar microprocessors, instructions stay in reorder buffer too long. As a result, instructions in the pipeline retire slowly. This paper proposed a two-level retirement architecture based on speculative execution, in which instructions classified according to instruction’s risk with exception and mis-prediction. Only the risky instructions could be created into reorder buffer, then retired after confirming the elimination of risk. It separated a result buffer from the reorder buffer to achieve register renaming and out-of-order writing back. The experiment shows that in the condition of the same resource, the performance can be improved by 28.8% at least, compared to the traditional architecture.
英文关键词 speculative execution; reorder buffer(ROB); fast retire; out-of-order write back; superscalar
参考文献 查看稿件参考文献
  [1] KIRMAN N, KIRMAN M, CHAUDHURI M, et al. Checkpointed early load retirement[C] //Proc of the 11th International Symposium on High-Performance Computer Architecture. [S. l. ] :IEEE Press, 2005:16-27.
[2] AKKARY H, RAJWAR R, SRINIVASAN S T. Checkpoint proces-sing and recovery:towards scalable large instruction window processors[C] //Proc of the 36th Annual IEEE/ACM International Symposium on Microarchitecture. [S. l. ] :IEEE Press, 2003:423-434.
[3] BELL G B, LIPASTI M H. Deconstructing commit[C] //Proc of IEEE International Symposium on Performance Analysis of Systems and Software. [S. l. ] :IEEE Press, 2004:68-77.
[4] AFRAM F, ZENG Hui, GHOSE K. A group-commit mechanism for ROB-based processors implementing the X86 ISA[C] //Proc of the 19th IEEE International Symposium on High Performance Computer Architecture. Washington DC:IEEE Computer Society, 2013:47-58.
[5] PETIT S, UBAL R, SAHUQUILLO J, et al. An efficient low-complexity alternative to the ROB for out-of-order retirement of instructions[C] //Proc of the 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools. Los Alamitos:IEEE Computer Society, 2009:635-642.
[6] 葛海通. 32 位高性能嵌入式 CPU 及平台研发[D] . 杭州:浙江大学, 2009.
[7] HENNESSY J L, PATTERSON D A. Computer architecture:a quantitative approach[M] . [S. l. ] :Elsevier, 2012.
[8] SIMA D. The design space of register renaming techniques[J] . IEEE Micro, 2000, 20(5):70-83.
[9] ROSIRE M, DESBARBIEUX J, DRACH N, et al. An out-of-order superscalar processor on FPGA:the reorder buffer design[C] //Proc of Design, Automation & Test in Europe Conference & Exhibition. Los Alamitos:IEEE Computer Society, 2012:1549-1554.
[10] UBAL R, SAHUQUILLO J, PETIT S, et al. VB-MT:design issues and performance of the validation buffer microarchitecture for multithreaded processors[C] //Proc of the 16th International Conference on Parallel Architecture and Compilation Techniques. Los Alamitos:IEEE Computer Society, 2007:429.
[11] HUS T H, LIN C W, CHEN C H. Using condition flag prediction to improve the performance of out-of-order processors[C] //Proc of IEEE International Symposium on Proc of Circuits and Systems. [S. l. ] :IEEE Press, 2013:1240-1243.
[12] MUTLU O, STARJ J, WILKERSON C, et al. Runahead execution:an alternative to very large instruction windows for out-of-order processors[C] //Proc of the 9th International Symposium on High-Perfor-mance Computer Architecture. Los Alamitos:IEEE Computer Society, 2003:129-140.
[13] RUPLEY J, KING J, QUINNELL E, et al. The floating-point unit of the Jaguar x86 core[C] //Proc of the 21st IEEE Symposium onComputer Arithmetic. [S. l. ] :IEEE Press, 2013:7-16.
[14] BURGESS B, COHEN B, DENMAN M, et al. Bobcat:AMDs low-power x86 processor[J] . IEEE Micro, 2011, 31(2):16-25.
[15] WOLFF S R, BARNES R D. Revisiting using the results of pre-executed instructions in runahead processors[J] . Computer Architecture Letters, 2013, 13(2):97-100.
[16] UHRIG S, JAHR R, UNGERER T. Advanced architecture optimisation and performance analysis of a reconfigurable grid ALU processor[J] . IET Computers & Digital Techniques, 2012, 6(5):334-341.
收稿日期 2014/4/6
修回日期 2014/5/19
页码 1032-1035
中图分类号 TP332
文献标志码 A