1 / 24

Speculative Software Management of Datapath-width for Energy Optimization

Speculative Software Management of Datapath-width for Energy Optimization. G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin. IRISA, Campus de Beaulieu 35042 Rennes Cedex, France. Context. Embedded applications use to operate on 8-/16-bit data > 50% of program instructions in some case.

morrison
Download Presentation

Speculative Software Management of Datapath-width for Energy Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu 35042 Rennes Cedex, France

  2. Context Embedded applications use to operate on 8-/16-bit data > 50% of program instructions in some case New opportunities for energy reduction … clock-gating at finer granularity, i.e. operand level

  3. Dynamic approach Compiler approach Exploiting narrow-width operands 1. cycle-by-cycle operand gating 1. based on static data flow analysis 2. complex hardware mechanisms required 2. must be overly conservative to preserve program correctness Brooks, et al. HPCA-99 Stephenson, et al. PLDI 2000

  4. Don’t want to pay the cost of a hardware scheme to detect when to clock-gate Don’t want to rely on static data flow analysis to discover bit-width ranges Our approach Dynamic approach Compiler approach Use compiler approach to switch from normal to narrow-width mode and vice-versa (via a reconfiguration instruction) Take advantage of dynamic approach to expose dynamic narrow-width operands to the compiler (via profiling) narrow-width execution mode is speculative : exception management allows to recover to the correct mode

  5. Bit-width distribution analysis • Cumulative distribution [Powerstone benchmarks] one operand two operands Narrow-width operands occurrence

  6. Bit-width distribution analysis • Dynamic distribution of narrow-width operands at basic block level (adpcm)

  7. Outline • Motivation • Micro-architectural support • Narrow-width regions formation • Simulation platform • Evaluation • Conclusions

  8. Register file model • We address a new dimension: • reduce register file activity by reducing register file width • Prior work to reduce the energy consumption in register file • limited port connectivity • limited number of registers Slice enable signal Tag bits 8bits 16bits 8bits 01 00110110 00110110 00110110 11000011 11 11110110 11110110 01 10010110 • We propose the byte-slice register file approach Row decoder 1. logically splitted 2. low-power mode via drowsy technique (allows to preserve register cells content) Flautner et al. ISCA-29 32bits

  9. Reconfigurable data-path • data-path resizable to accommodate to the bit-width execution mode (via clock-gating) • pipeline latches • ALU • clock-gating at coarser granularity Write-back (8/16/32 mode) Slice-enable signal (8/16/32 mode) Bypass (8/16/32 mode) ALU LSU (8/16/32 mode) (8/16/32 mode)

  10. Exception management • Data-path width misprediction may occur due to a dynamic event • Simple recovery scheme • the tag bits indicate the true data-width • upon a misprediction: • trigger an exception • recover to the correct execution mode

  11. Address instructions • Special care must be taken with address instructions • separate address calculation from memory access • Use of dedicated registers for address computation • accumulator registers with additional ISA support (see paper for details)

  12. Outline • Motivation • Micro-architectural support • Narrow-width regions formation • Simulation platform • Evaluation • Conclusions

  13. A two steps process input data sets annotated .s file machine Step 1 modified .s file Step 2 annotated .s file address transformation

  14. Profiling • Bit-width characteristics of selected regions 32 bits other LD/ST with 32 bits 8/16 bits weight of regions in program 100% 80% 60% Narrow-width operands 40% 20% 0%

  15. Address instructions transformation • A graph partitioning formulation: • G, DDG of a BB • iff there is def-use relation between n and m • Problem transform memory instructions into equivalent accumulator-based instructions Select (n,m) such that n has a 32-bit width operand and m is a LD/ST instr add1 add1 add -> Rx load mov Rx -> ACC Replace m with accumulator-based instructions Minimize cut-size, number of instructions to move data from regfile to accumulators add2 LDACC Ry add2

  16. Instructions reordering • Problem: • reorder instructions in a basic block such that operations with 32-bits operands are move around 8/16 bits operations

  17. Outline • Motivation • Micro-architectural support • Narrow-width regions formation • Evaluation • Conclusions

  18. Simulation platform • Tools • CACTI : register file energy access • HotLeakage: leakage energy • Lx processor platform • in-order • 4-issue width • 64 32-bit GPR • 8 1-bit CBR • 6 stages pipeline • 4 ALUs, 1 LSU • 2 MULs

  19. Summary of results • IPC degradation with varying misprediction penalty and varying bit-width convergence

  20. Summary of results • Dynamic energy reduction

  21. Summary of results • Register file static energy savings

  22. Outline • Motivation • Micro-architectural support • Narrow-width regions formation • Evaluation • Conclusions

  23. Conclusions • Contribution to power-aware compilation • speculative management of processor data-path in software • simple exception management scheme to repair a software misprediction • Evaluation results • 17% data-path dynamic energy savings • 22% register file static energy savings • performance impact varies with implementation cost of the recovery scheme • Future work • evaluation with larger granularity (e.g. trace) • can reduce number of mispredictions • can reduce amount of reconfiguration instructions

  24. Thanks ! Questions …

More Related