1 / 22

Prediction of CPU Idle-Busy Activity Pattern

Prediction of CPU Idle-Busy Activity Pattern. Author: Qian Diao, Justin Song Presented by: Justin Song Intel Corporation 14 th International Symposium on High-Performance Computer Architecture Salt Lake City, UT - Feb 18, 2008. Agenda. Introduction Usage model

osman
Download Presentation

Prediction of CPU Idle-Busy Activity Pattern

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prediction of CPUIdle-Busy Activity Pattern Author: Qian Diao, Justin Song Presented by: Justin Song Intel Corporation 14th International Symposium onHigh-Performance Computer Architecture Salt Lake City, UT - Feb 18, 2008

  2. Agenda • Introduction • Usage model • CPU idle pattern • Prediction algorithm • Result • Benefit analysis • Summary & future work

  3. Problem • C-state: CPU idle state (no instr being executed) • C-state based CPU power management: potentially big benefit • Workloads rarely saturate multi-core CPU • C-state technology being matured • lower power, higher compute efficiency, Si support • How to use C-state: broken • Today: only OSPM selects C-state for logical CPU (core/thread) • A lot of wrong decisions – performance regression, or power waste • Performance concern may prevent deep C-state enabling ACPI table Linux C-state policy • Case last C: • C1: 4 consecutive idles > C2.lat, choose C2 for next C • C2: 10 consecutive idles > C3.lat, choose C3 for next C • C2/C3: last idle < C2/C3.lat, demote

  4. GOOD Prediction Helps • No worry for perf drop • Possible causes for deep C-state to degrade perf • Coming C0% too high (e.g. >90%); no headroom to accommodate deep C • Equivalent statement: coming idle duration too short; deep C’s latency cannot be amortized) • Deep C, under some circumstance, prevents proprietary Si optimization for perf compensation from working • Thread context loss • On-core/pkg cache flush • Deep C-state’s power benefit maximized

  5. Our Methodology • Modeling problem • Use easy-to-observe metrics • Need domain knowledge assistance (Si PM optimization) • Prediction Model: DBN (Dynamic Bayesian Networks) • Generalization of HMM and LDS (KFM) • Combine natural mechanism for expressing domain knowledge with efficient algorithms for learning and inference • Model evaluation • Model simplification • For deployment in SW/FW/HW • Power benefit / perf impact quantification

  6. Agenda • Introduction • Usage model • CPU idle pattern • Prediction algorithm • Result • Benefit analysis • Summary & future work

  7. Usage Model Use activity prediction result to direct C-state usage and performance compensation

  8. Agenda • Introduction • Usage model • CPU idle pattern • Prediction algorithm • Result • Benefit analysis • Summary & future work

  9. CPU Package Activity State • Package activity – all cores idle-busy activity • All-core-idle • All-core-busy • Package partial idle (at least one core idle and one core busy) • All other 2^N-2 states (N=# of cores) • How PM benefits from the definition • Idle-busy (not OSPM selected C-state) pattern reflects workload timing nature • Aligned with shared-power-lane design • Only when all cores are idle, package’s mem and I/O control logic can go to lower power state • Only when at least one core idle, active cores’ performance can only be possibly compensated • Break-down of package partial idle  core location information Quad-core CPU package activity state change over time

  10. CPU Idle Pattern • Definition: residency% of each package activity state during an observation time slot • How prediction benefits from the definition • Prediction of package idle pattern: random variable becomes discrete • Prediction of idle duration: hard to use discrete prediction model • Single-core’s idle duration prediction cannot help the whole CPU package power saving and performance compensation • Hard to know if cores’ idles overlap Dual-core CPU package activity pattern over time

  11. Agenda • Introduction • Usage model • CPU idle pattern • Prediction algorithm • Result • Benefit analysis • Summary & future work

  12. Prediction Algorithm • Kalman Filter Model used for prediction • Time series (observed CPU package patterns) is Markov process • Observation made every 500us • KFM generalized in Dynamic Bayesian Networks • Explicit probability definition (Bayesian theory) • Good network structure description (graph theory) • Algorithm • Inputs • T observed history CPU package patterns. Each state’s percentage series is defined as an independent variable. • A-priori state transition, deviation, observation covariance • Interim outputs • Hidden conditional probability distribution • Final outputs • Prediction for (T+1)th CPU package pattern • Inference • Forward operator (1 to T) • Backward operator (T+1 backto 1) • Complexity (T: # history observations; N: # of activity states) • O(TN^3)

  13. Algorithm Simplification • 2^N states  3 states (all busy, all idle, partial idle) • One step forward and backward computation • Forward: storing (T-1)’s intermediate results • Backward: just compute (T+1) • Complexity of simplified algorithm • Best case: O(1) • Worst case: O(T), when need to discard history intermediate results and start over Co-processor based prediction time estimate

  14. Agenda • Introduction • Usage model • CPU idle pattern • Prediction algorithm • Result • Benefit analysis • Summary & future work

  15. For DP CPU, 4 variables: (busy,busy)%, (busy, idle)%, (idle, busy)%, (idle, idle)%; 3 of them are independent; no aggregation for partial idles Result Grand-truth value Predicted value Distance from grand truth is prediction error Smoothed follows observed very well

  16. Result – Cont’d All states prediction: useful for location aware optimization All-busy, all-idle, partial-idle prediction: useful for shared power plane optimization

  17. Agenda • Introduction • Usage model • CPU idle pattern • Prediction algorithm • Result • Benefit analysis • Summary & future work

  18. Benefit Analysis Method • Tracing idle-busy events on real quad-core processor • Simulate OSPM C-state decision making (baseline) • Simulate C-state decision based on prediction result • Prediction error injected • Cycle-by-cycle C-state’s power and transition energy accumulated • Accumulated energy / run time = average power • Compare prediction based c-state selection against OSPM baseline

  19. Benefit Result *: power delta < transition energy (if OSPM selects C2/C3) or idle length > C2/C3 latency (if OSPM selects C1) **: based on power numbers in figure 1 (source: ACPI spec [1]). It doesn’t represent real power of our experimentation processor. ***: based on latency numbers in figure 1 (source: ACPI spec [1]). It doesn’t represent real latency of our experimentation processor.

  20. Agenda • Introduction • Usage model • CPU idle pattern • Prediction algorithm • Result • Benefit analysis • Summary & future work

  21. Summary & Future Work • Good problem modeling and prediction is key for fully taking advantage of deep C-state’s power benefit • KFM model works for CPU package pattern prediction for SPECWeb • To evaluate more workloads with more general assumptions

  22. Q & A

More Related