Prediction of CPU Idle-Busy Activity Pattern

Prediction of CPUIdle-Busy Activity Pattern Author: Qian Diao, Justin Song Presented by: Justin Song Intel Corporation 14th International Symposium onHigh-Performance Computer Architecture Salt Lake City, UT - Feb 18, 2008

Agenda • Introduction • Usage model • CPU idle pattern • Prediction algorithm • Result • Benefit analysis • Summary & future work

Problem • C-state: CPU idle state (no instr being executed) • C-state based CPU power management: potentially big benefit • Workloads rarely saturate multi-core CPU • C-state technology being matured • lower power, higher compute efficiency, Si support • How to use C-state: broken • Today: only OSPM selects C-state for logical CPU (core/thread) • A lot of wrong decisions – performance regression, or power waste • Performance concern may prevent deep C-state enabling ACPI table Linux C-state policy • Case last C: • C1: 4 consecutive idles > C2.lat, choose C2 for next C • C2: 10 consecutive idles > C3.lat, choose C3 for next C • C2/C3: last idle < C2/C3.lat, demote

GOOD Prediction Helps • No worry for perf drop • Possible causes for deep C-state to degrade perf • Coming C0% too high (e.g. >90%); no headroom to accommodate deep C • Equivalent statement: coming idle duration too short; deep C’s latency cannot be amortized) • Deep C, under some circumstance, prevents proprietary Si optimization for perf compensation from working • Thread context loss • On-core/pkg cache flush • Deep C-state’s power benefit maximized

Our Methodology • Modeling problem • Use easy-to-observe metrics • Need domain knowledge assistance (Si PM optimization) • Prediction Model: DBN (Dynamic Bayesian Networks) • Generalization of HMM and LDS (KFM) • Combine natural mechanism for expressing domain knowledge with efficient algorithms for learning and inference • Model evaluation • Model simplification • For deployment in SW/FW/HW • Power benefit / perf impact quantification

Usage Model Use activity prediction result to direct C-state usage and performance compensation

CPU Package Activity State • Package activity – all cores idle-busy activity • All-core-idle • All-core-busy • Package partial idle (at least one core idle and one core busy) • All other 2^N-2 states (N=# of cores) • How PM benefits from the definition • Idle-busy (not OSPM selected C-state) pattern reflects workload timing nature • Aligned with shared-power-lane design • Only when all cores are idle, package’s mem and I/O control logic can go to lower power state • Only when at least one core idle, active cores’ performance can only be possibly compensated • Break-down of package partial idle  core location information Quad-core CPU package activity state change over time

CPU Idle Pattern • Definition: residency% of each package activity state during an observation time slot • How prediction benefits from the definition • Prediction of package idle pattern: random variable becomes discrete • Prediction of idle duration: hard to use discrete prediction model • Single-core’s idle duration prediction cannot help the whole CPU package power saving and performance compensation • Hard to know if cores’ idles overlap Dual-core CPU package activity pattern over time

Prediction Algorithm • Kalman Filter Model used for prediction • Time series (observed CPU package patterns) is Markov process • Observation made every 500us • KFM generalized in Dynamic Bayesian Networks • Explicit probability definition (Bayesian theory) • Good network structure description (graph theory) • Algorithm • Inputs • T observed history CPU package patterns. Each state’s percentage series is defined as an independent variable. • A-priori state transition, deviation, observation covariance • Interim outputs • Hidden conditional probability distribution • Final outputs • Prediction for (T+1)th CPU package pattern • Inference • Forward operator (1 to T) • Backward operator (T+1 backto 1) • Complexity (T: # history observations; N: # of activity states) • O(TN^3)

Algorithm Simplification • 2^N states  3 states (all busy, all idle, partial idle) • One step forward and backward computation • Forward: storing (T-1)’s intermediate results • Backward: just compute (T+1) • Complexity of simplified algorithm • Best case: O(1) • Worst case: O(T), when need to discard history intermediate results and start over Co-processor based prediction time estimate

For DP CPU, 4 variables: (busy,busy)%, (busy, idle)%, (idle, busy)%, (idle, idle)%; 3 of them are independent; no aggregation for partial idles Result Grand-truth value Predicted value Distance from grand truth is prediction error Smoothed follows observed very well

Result – Cont’d All states prediction: useful for location aware optimization All-busy, all-idle, partial-idle prediction: useful for shared power plane optimization

Benefit Analysis Method • Tracing idle-busy events on real quad-core processor • Simulate OSPM C-state decision making (baseline) • Simulate C-state decision based on prediction result • Prediction error injected • Cycle-by-cycle C-state’s power and transition energy accumulated • Accumulated energy / run time = average power • Compare prediction based c-state selection against OSPM baseline

Benefit Result *: power delta < transition energy (if OSPM selects C2/C3) or idle length > C2/C3 latency (if OSPM selects C1) **: based on power numbers in figure 1 (source: ACPI spec [1]). It doesn’t represent real power of our experimentation processor. ***: based on latency numbers in figure 1 (source: ACPI spec [1]). It doesn’t represent real latency of our experimentation processor.

Summary & Future Work • Good problem modeling and prediction is key for fully taking advantage of deep C-state’s power benefit • KFM model works for CPU package pattern prediction for SPECWeb • To evaluate more workloads with more general assumptions

Q & A

Prediction of CPU Idle-Busy Activity Pattern

Prediction of CPU Idle-Busy Activity Pattern

Presentation Transcript

Idle Reduction Projects for the Advanced Vehicle Testing Activity

The Science of Prediction Monitoring Volcanic Activity

Idle Management

“Virtualisation and Parallelisation for using Opportunistic Idle CPU Resources”

The Cost of Idle Computers

idle

The Science of Prediction Monitoring Volcanic Activity

Kernel Methods for fMRI Pattern Prediction

Pattern Discovery of Fuzzy Time Series for Financial Prediction

Accurate and Complexity-Effective Spatial Pattern Prediction

Household Activity Pattern Problem

Vermont Idle-Free Zone idle-freevt/idlingfactsdex.html

A pattern fusion model for multi-step-ahead CPU load prediction

Tests for prediction and Activity of dental caries

Hunter of Idle Workstations

Idle Time

BUSY

Idle

Activity Prediction

Spatiotemporal Pattern Mining For Travel Behavior Prediction

Useful IDLE

Idle Reduction