180 likes | 265 Views
Power Awareness through Selective Dynamically Optimized Traces. Roni Rosner, Yoav Almog, Micha Moffie, Naftali Schwartz and Avi Mendelson – Intel Labs, Haifa, Israel. Presenter: Ioana Burcea. Agenda. Motivation for PARROT = Power-Aware aRchitecture Running Optimized Traces
E N D
Power Awareness through Selective Dynamically Optimized Traces Roni Rosner, Yoav Almog, Micha Moffie, Naftali Schwartz and Avi Mendelson – Intel Labs, Haifa, Israel Presenter: Ioana Burcea
Agenda • Motivation for PARROT = Power-Aware aRchitecture Running Optimized Traces • PARROT Concept and Architecture • Performance and Energy Results • Discussion • What makes PARROT a power-aware architecture? • What is new about this paper? / What are the contributions of this paper?
Motivation • We pay more energy per task • Poor scaling of performance with power consumption • PARROT tries to change the balance • Filtering Techniques to Improve Trace-Cache Efficiency – PACT 2001 • Selecting Long Atomic Traces for High Coverage – ICS 2003 • Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture – CGO 2004
PARROT Concepts – The Big Picture • Based on the well-known cold/hot (10/90) paradigm • PARROT Principles • Reuse: trace-cache centric • Dynamic optimizations: more performance with less energy • Focus: invest where it pays • Pipeline decoupling: hybrid front-end, cold and hot execution pipelines • Transparency: immune to s/w compatibility
Traces and Trace Selection • Decoded atomic traces • Complex retirement & recovery in case of misprediction • More aggressive optimizations • Trace Selection – deterministic criteria • Capacity limitation: 64 uops • Complete basic blocks • Terminating CTI (control-transfer instructions) • Indirect jumps, software exceptions, backward taken branches • Return instructions: procedure inlining • Trace join
Microarchitecture • Split-execution vs. unified-execution • Foreground phase: fetch-to-execution pipeline • Background phase (post-processing): trace selection and optimization
Microarchitecture (cont’d) • Two predictors: GHR = Global History Buffer • Branch predictor • Trace predictor • Deterministic trace build scheme • Filtering mechanisms: • The hot filter selects frequent traces from those executed on the cold pipeline • The blazing filter selects for optimization the hottest traces • Dynamic optimizations • generic and core specific optimizations • gradually applied (?)
Simulation framework • An “in-house” proprietary performance and power simulator • Optimizations applied as different passes • Optimization delay for one trace ~ 100 cycles • Energy simulation • Power consumption matrix for each operation on each hardware unit • Leakage • Uniform leakage in space over the processor core and L2 cache and in time modeling a high temperature • LE = PMAX * (0.05 * M + 0.4*K) * CYC
Experimental Evaluation • Metrics • IPC • Total energy • Cubic-MIPS-per-WATT (CMPW) • A measure of the design tradeoffs between power and performance • Benchmarks • SpecInt2000 • SpecFP2000 • Office • Multimedia • DotNet
Our Conclusions • What makes PARROT a power-aware architecture? • What is new about this paper? / What are the contributions of this paper? • rePlay (?)