220 likes | 437 Views
Prediction of CPU Idle-Busy Activity Pattern. Author: Qian Diao, Justin Song Presented by: Justin Song Intel Corporation 14 th International Symposium on High-Performance Computer Architecture Salt Lake City, UT - Feb 18, 2008. Agenda. Introduction Usage model
E N D
Prediction of CPUIdle-Busy Activity Pattern Author: Qian Diao, Justin Song Presented by: Justin Song Intel Corporation 14th International Symposium onHigh-Performance Computer Architecture Salt Lake City, UT - Feb 18, 2008
Agenda • Introduction • Usage model • CPU idle pattern • Prediction algorithm • Result • Benefit analysis • Summary & future work
Problem • C-state: CPU idle state (no instr being executed) • C-state based CPU power management: potentially big benefit • Workloads rarely saturate multi-core CPU • C-state technology being matured • lower power, higher compute efficiency, Si support • How to use C-state: broken • Today: only OSPM selects C-state for logical CPU (core/thread) • A lot of wrong decisions – performance regression, or power waste • Performance concern may prevent deep C-state enabling ACPI table Linux C-state policy • Case last C: • C1: 4 consecutive idles > C2.lat, choose C2 for next C • C2: 10 consecutive idles > C3.lat, choose C3 for next C • C2/C3: last idle < C2/C3.lat, demote
GOOD Prediction Helps • No worry for perf drop • Possible causes for deep C-state to degrade perf • Coming C0% too high (e.g. >90%); no headroom to accommodate deep C • Equivalent statement: coming idle duration too short; deep C’s latency cannot be amortized) • Deep C, under some circumstance, prevents proprietary Si optimization for perf compensation from working • Thread context loss • On-core/pkg cache flush • Deep C-state’s power benefit maximized
Our Methodology • Modeling problem • Use easy-to-observe metrics • Need domain knowledge assistance (Si PM optimization) • Prediction Model: DBN (Dynamic Bayesian Networks) • Generalization of HMM and LDS (KFM) • Combine natural mechanism for expressing domain knowledge with efficient algorithms for learning and inference • Model evaluation • Model simplification • For deployment in SW/FW/HW • Power benefit / perf impact quantification
Agenda • Introduction • Usage model • CPU idle pattern • Prediction algorithm • Result • Benefit analysis • Summary & future work
Usage Model Use activity prediction result to direct C-state usage and performance compensation
Agenda • Introduction • Usage model • CPU idle pattern • Prediction algorithm • Result • Benefit analysis • Summary & future work
CPU Package Activity State • Package activity – all cores idle-busy activity • All-core-idle • All-core-busy • Package partial idle (at least one core idle and one core busy) • All other 2^N-2 states (N=# of cores) • How PM benefits from the definition • Idle-busy (not OSPM selected C-state) pattern reflects workload timing nature • Aligned with shared-power-lane design • Only when all cores are idle, package’s mem and I/O control logic can go to lower power state • Only when at least one core idle, active cores’ performance can only be possibly compensated • Break-down of package partial idle core location information Quad-core CPU package activity state change over time
CPU Idle Pattern • Definition: residency% of each package activity state during an observation time slot • How prediction benefits from the definition • Prediction of package idle pattern: random variable becomes discrete • Prediction of idle duration: hard to use discrete prediction model • Single-core’s idle duration prediction cannot help the whole CPU package power saving and performance compensation • Hard to know if cores’ idles overlap Dual-core CPU package activity pattern over time
Agenda • Introduction • Usage model • CPU idle pattern • Prediction algorithm • Result • Benefit analysis • Summary & future work
Prediction Algorithm • Kalman Filter Model used for prediction • Time series (observed CPU package patterns) is Markov process • Observation made every 500us • KFM generalized in Dynamic Bayesian Networks • Explicit probability definition (Bayesian theory) • Good network structure description (graph theory) • Algorithm • Inputs • T observed history CPU package patterns. Each state’s percentage series is defined as an independent variable. • A-priori state transition, deviation, observation covariance • Interim outputs • Hidden conditional probability distribution • Final outputs • Prediction for (T+1)th CPU package pattern • Inference • Forward operator (1 to T) • Backward operator (T+1 backto 1) • Complexity (T: # history observations; N: # of activity states) • O(TN^3)
Algorithm Simplification • 2^N states 3 states (all busy, all idle, partial idle) • One step forward and backward computation • Forward: storing (T-1)’s intermediate results • Backward: just compute (T+1) • Complexity of simplified algorithm • Best case: O(1) • Worst case: O(T), when need to discard history intermediate results and start over Co-processor based prediction time estimate
Agenda • Introduction • Usage model • CPU idle pattern • Prediction algorithm • Result • Benefit analysis • Summary & future work
For DP CPU, 4 variables: (busy,busy)%, (busy, idle)%, (idle, busy)%, (idle, idle)%; 3 of them are independent; no aggregation for partial idles Result Grand-truth value Predicted value Distance from grand truth is prediction error Smoothed follows observed very well
Result – Cont’d All states prediction: useful for location aware optimization All-busy, all-idle, partial-idle prediction: useful for shared power plane optimization
Agenda • Introduction • Usage model • CPU idle pattern • Prediction algorithm • Result • Benefit analysis • Summary & future work
Benefit Analysis Method • Tracing idle-busy events on real quad-core processor • Simulate OSPM C-state decision making (baseline) • Simulate C-state decision based on prediction result • Prediction error injected • Cycle-by-cycle C-state’s power and transition energy accumulated • Accumulated energy / run time = average power • Compare prediction based c-state selection against OSPM baseline
Benefit Result *: power delta < transition energy (if OSPM selects C2/C3) or idle length > C2/C3 latency (if OSPM selects C1) **: based on power numbers in figure 1 (source: ACPI spec [1]). It doesn’t represent real power of our experimentation processor. ***: based on latency numbers in figure 1 (source: ACPI spec [1]). It doesn’t represent real latency of our experimentation processor.
Agenda • Introduction • Usage model • CPU idle pattern • Prediction algorithm • Result • Benefit analysis • Summary & future work
Summary & Future Work • Good problem modeling and prediction is key for fully taking advantage of deep C-state’s power benefit • KFM model works for CPU package pattern prediction for SPECWeb • To evaluate more workloads with more general assumptions