210 likes | 228 Views
Explore the techniques of pipeline gating and confidence estimation to reduce energy consumption in processors, with emphasis on SPEC and PVN metrics. Learn about cache decay strategies to minimize leakage power and overheads involved.
E N D
CS 7810 Lecture 13 Pipeline Gating: Speculation Control For Energy Reduction S. Manne, A. Klauser, D. Grunwald Proceedings of ISCA-25 June 1998
Cost of Speculation Mispredict rates 9.9 12.2 23.9 10.4 6.9 4.6 11.3 1.7
Pipeline Gating • Low confidence branches throttle instr fetch until they are resolved • Pipeline gating usually lasts for fewer than five cycles
Metrics • SPEC (specificity): fraction of all mispredicted • branches detected as low-confidence by the • confidence estimator (coverage) • PVN (predictive value of a negative test): probability • of a low-confidence branch being incorrectly • branch-predicted (accuracy)
Confidence Estimators • Perfect: to gauge potential benefits • Static: branches that have low prediction rates • JRS: if a branch has yielded N successive correct • predictions, it has high confidence • Saturating counters: unbiased counter value or • disagreement in two predictors low confidence • Distance: mpreds are clustered, hence the first 4 • branches after a mispredict have low confidence
SPEC and PVN SPEC (coverage): mispred branches detected by low-confidence estimator PVN (accuracy): % of low-confidence branches that are branch mpreds • It is easier to achieve a high SPEC value than PVN • A high PVN value can be achieved by using N low-confidence branches • to invoke gating – if PVN is 30%, re-defining low-confidence as two • low-confidence branches increases PVN to 51%
Results • Can gating improve performance? – only if cache • pollution is significant • Less than 1% performance loss and up to 38% • reduction in extra work • Energy consumption could go up – some work is • independent of number of executed instrs (clock • distribution) – incr. execution time can incr. Energy • Pipeline gating should reduce power consumption
CS 7810 Lecture 13 Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power S. Kaxiras, Z. Hu, M. Martonosi Proceedings of ISCA-28 July 2001
Leakage Power Trends • Circuit delay a 1/(V – Vth) • Leakage a num transistors (incr) • supply voltage (decr) • (exp) low thresh. voltage (incr) • L1 and L2 caches are the biggest • contributors (high transistor budgets)
Vdd-Gating • Leakage can be reduced by gating off the • supply voltage to the circuit • When applied to a cache, the contents of the • SRAM cell are lost • Cache decay: apply Vdd-gating when you do not • care about cache contents
Overheads • Hardware to determine when to decay • Introduces additional cache misses • Normalized cache leakage power = • Activeratio (fraction of cache that is powered on) + • (Counter overhead : Leak) x activity + • (L2 access energy : Leak) x num-misses • Increased execution time (< 0.7%) • L2 access/leakage ratio is ~9
Skier’s Dilemma New skis: $400 Ski rentals: $20 Heuristic: Buy skis after rental cost = purchase price Ski trips: 5 10 15 20 25 50 Optimal: $100 $200 $300 $400 $400 $400 Heuristic: $100 $200 $300 $800 $800 $800 Likewise, decay a cache line when the cost of an additional miss equals leakage dissipated so far
Tracking Dead Time • Each line has a 2-bit counter that gets reset on • every access and gets incremented every 2500 • cycles through a global signal (negligible overhead) • After 10,000 clock cycles, the counter reaches • the max value and triggers a decay • Adaptive decay: Start with a short decay period; • if you have a quick miss, double the period; if there • is no miss, halve the period
Other Results • L2 cache is equally suitable to decay techniques • -- lifetimes are scaled by a factor of 10, an extra • miss also costs a lot more • For their experiments, there is little interference • from multiprogramming • Some instructions can easily be identified as • last touches to a cache block – potential for early • cache decay • Can this apply to bpred, register file?
Title • Bullet