170 likes | 359 Views
Energy Efficient Register Renaming. Gurhan Kucuk, Oguz Ergin, Dmitry Ponomarev, Kanad Ghose Department of Computer Science State University of New York Binghamton, NY 13902-6000 http://www.cs.binghamton.edu/~lowpower.
E N D
PATMOS 2003 Energy Efficient Register Renaming Gurhan Kucuk, Oguz Ergin, Dmitry Ponomarev, Kanad Ghose Department of Computer Science State University of New York Binghamton, NY 13902-6000 http://www.cs.binghamton.edu/~lowpower 13th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS’03), September 11th 2003 *supported in part by DARPA through the PAC-C program and NSF
PATMOS 2003 Outline • Motivations • The Register Alias Table (RAT) Complexity • Complexity-Effective RAT Designs • Exploiting the intra-group dependencies • Buffering recent address tranlations • Results and discussions • Conclusions
PATMOS 2003 Motivation • RAT maintains the register address tranlations needed for handling the true data dependencies • High Power Dissipation • 14% of the overall power is attributed to the RAT in the global power analysis performed in [FG 01] • High Power Density
PATMOS 2003 The RAT Complexity (W-way CPU) W write ports to update W RAT entries for the destinations of the co-dispatched instructions RAT 2W read ports to translate the source register addresses W read ports for checkpointing the old mapping of the destination register
PATMOS 2003 Register Renaming Steps • Step 1. The following substeps are performed in parallel: • RAT reads for the sources of each of the co-dispatched instructions are performed in parallel, assuming that no dependencies exist among the instructions. • New physical registers are allocated for the destination registers of all of the co-dispatched instructions. • Data dependencies among the instructions are noted, using a set of comparators. The address of each destination register in a group of instructions is compared against the sources of all following instructions in the group and if a match occurs, the dependency is detected. • Step 2. If a data dependency is detected among a pair of instructions, the source physical register for the dependent instruction as read out from the RAT is replaced with the allocated destination register address of the instruction producing the source to preserve the true dependencies.
PATMOS 2003 Conditional Sensing (CSense) – Exploiting the Intra-Group Dependencies • CSense disables parts of the RAT read accesses if the intra-group data dependency is noted ADD R1, R2, R3 LOAD R4, R1, R3 SUB R5, R4, R2 = =
PATMOS 2003 Conditional Sensing (CSense) – Exploiting the Intra-Group Dependencies • CSense disables parts of the RAT read accesses if the intra-group data dependency is noted ADD R1, R2, R3 LOAD R4, R1, R3 SUB R5, R4, R2 0 1 = = 0 Disable sense amp
PATMOS 2003 Conditional Sensing (CSense) – Exploiting the Intra-Group Dependencies • CSense disables parts of the RAT read accesses if the intra-group data dependency is noted ADD R1, R2, R3 LOAD R4, R1, R3 SUB R5, R4, R2 0 0 = = = = 1 Enable sense amp
PATMOS 2003 Percentage of source operands that are produced by the co-dispatch instructions in the same cycle %
PATMOS 2003 Buffering Recent Address Translations • SPEC 2000 simulations show that dependent instructions are usually very close in proximity to each other • If the register needed as a source is defined by an earlier co-dispatched instruction, CSense scheme becomes useful • If the register needed as a source is defined by an instruction that is dispatched in previous cycles, we utilize External Latches (ELs) to faster serve recent register mappings
PATMOS 2003 Buffering Recent Address Translations • The RAT access for a source register now proceeds as follows: • Start accessing the RAT and at the same time address the ELs to see if the desired entry is located in one of the ELs • If a matching entry is found, discontinue the access from the RAT
PATMOS 2003 Renaming Logic with Four External Latches
PATMOS 2003 The Hit Ratio to External Latches %
PATMOS 2003 Experimental Setup (AccuPower, DATE’02) Compiled SPEC benchmarks Performance stats Microarchitectural Simulator Datapath specs Transition counts, Context information Power/energy stats Energy/Power Estimator VLSI layout data SPICE SPICE deck SPICE measures of Energy per transition
PATMOS 2003 Energy of the Baseline and Proposed RAT Designs EnergySavings 15% 19% 27% 30% pJ
PATMOS 2003 Conclusions • Two techniques to reduce RAT power have been proposed: • CSense • Buffering Recent Register Mappings • 30% energy savings • No performance penalty • Little additional complexity • No increase in the processor’s cycle time