210 likes | 381 Views
Power estimation in the algorithmic and register-transfer level. September 25, 2006 Chong-Min Kyung. Software power analysis. Objective ; Compare different programs Select processors Optimize software Three level of granularity, (acc. to execution speed, availability & accuracy)
E N D
Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung
Software power analysis • Objective ; • Compare different programs • Select processors • Optimize software • Three level of granularity, (acc. to execution speed, availability & accuracy) • Source code level • Instruction level • BFM (Bus Function Model) level
Execution performed on • 1) Target processor ; Compile source code & run • measure the heat generated to estimate the power? • Or monitor (with inserted monitoring instructions, or some hardware, both with hopefully negligible overhead/disturbance on the power and speed) to count the occurrence of each instruction and compute the total estimated power? • Dynamic code can be also handled. • Minimal disturbance of the overhead code is the key to accuracy
Execution performed on • 2) Another processor ; Run a program estimating the power consumption with the target-compiled code as input data. • Only the power consumption of the static code can be estimated. • 3) Simulator ; • Either in source code level, • Or instruction code level (same as ‘Another Processor’)
Power estimation of Software • Simplest approach ; Energy consumption is proportional to the program execution time. • Instruction set approach ; Energy consumption is different for each instruction class (class of similar power behavior), and each class of instruction pair (inter-instruction dependency). • Measurement done by running long loops of the same instruction
Power estimation of Software • Becomes more difficult with more complex processor (multi-thread, out-of-order execution,…) and memory system architecture (cache..) • Accurate estimation requires software profiling on ISS with bus access pattern. • A 5% accurate estimation model developed for ARM processor [DAC 99, Simunic, T;Cycle-accurate simulation of energy consumption in embedded systems]
Algorithmic-level power estimation • Algorithmic-level power estimation consists of • Architecture estimation • Activation estimation • Power model evaluation • Architecture estimation by High-Level Synthesis (HLS) • Allocation, Scheduling, and Binding (Allocation in narrow sense is ‘unit selection’, where each operation can be performed by more than one unit.) • Allocation and Scheduling affect each other. • HLS considering communication (interconnect) • ASB + floorplanning • Cycle time violation check based on wire delay (based on wire length estimation) • (HLS considering interconnect) and power
Target architecture of HLS • Target architecture of HLS • Datapath <- dataflow of CDFG • Controller <- dataflow and control flow • Clock tree
Target architecture of HLS • Architecture synthesis = • Schedule the operations under timing & resource constraints, and • Allocate the required resources (operation units) • Operation unit can be arithmetic module, logic module or memory module. • Output of architecture synthesis is • A set of operation units • Registers • Steering logic to transfer data between operation units and registers, and • Controller having control signals to steer MUX, OU and Enable signal of registers • How to integrate power optimization into HLS?
RTL Power Modeling • RTL Power Modeling = Constructing a model Power=P(X1,X2,…Xn) from n model parameters
Issues of RTL power modeling • Granularity ; • Choice of model parameters ; • Activity model or complexity model or both? • Semantic of the model ; • cumulative or cycle-accurate? • How to build and store the model ; • Top-down or bottom-up? • Table or equation?
Model granularity • Model granularity ; • Should not be too big; • E.g., single monolithic model is too time-consuming to build, inaccurate, and inflexible • Not too small; • FSMD (FSM with datapath) is a reasonable choice, as RTL design is an interaction of datapath and controller • Five main components ; • Controller • Register file • Bus • Memory • Functional blocks
Activity model or Complexity model, or both? • Model Parameters ; • What parameters are to be included in the model? • Model parameters must be observable at the RTL • P total = k AiCi ; Power model decoupled into two separate models, i.e., activity model and capacitance model • Activity model or Complexity model, or both? • Complexity model can be just capacitance model or include transistor count as well to account for the leakage current.
Activity parameters • RTL activity : an approximation of all intra-clock cycle activities projected to the relevant clock transition point. • Main parameters are static and transition probabilities • Choose between bit-wise and word-wise probability according to the desired accuracy and speed • n-input, m-output component has (n+m) bitwise parameters, while has only two word-wise parameters • Additional parameters; • Transition density ; average switching rate per second • Includes non-periodic signals • Correlation measures ; useful for computing switching power • Spatial correlation • Temporal correlation = transition probability • Entropy ; somewhat similar to transition probability (2p(1-p) • plog2(1/p)+(1-p)log2(1/(1-p))
Complexity parameters • Capacitance ~ gate count, TR count,. • Only complexity parameters available at RTL are • Width of a component ; # of inputs, outputs • # of states ; applicable for controller • Architecture-specific model • k12N2 for NxN multiplier • k2N for ripple carry adder
Model semantics • Cumulative (average) vs. cycle-accurate ; • Cumulative power = summation of average (cumulative) power over module • Cycle-accurate power = summation of power over module for each clock cycle • Cumulative power is only as good as tracking battery time, average heat dissipation, etc. • Cycle-accurate power is needed for IP drop, noise, reliability (electromigration) analysis. • Pseudo-cycle-accurate power estimation may be okay for dynamic power management.
How to build and store the model • Model construction • Top-down ; good for • When the implementation follows some predictable template, e.g., memory • When dealing with a new circuit having no measured data available • Bottom-up ; • Can be equation-based • Template for the power model is given first, • Statistical techniques are used to fit the measured values to the model by adjusting cofficients • Model storage • Equation-based • Table-based
Accuracy issue • Metric ; E = lPe-Pl/max(Pe,P) • Average error • Standard deviation
Macro modeling flow • Choose model parameters - Ex) Average switching activity of inputs and/or outputs • Design training set • Good coverage, unbiasedness, resembling actual circumstantial conditions • Characterization • Running the power-accurate lower-level simulator • For example, for RTL training, run a gate-level simulator with good coverage of input/output switching activities • Model extraction • For Equation-based, run LMS regression engine • For table-based, merge entries according to the available table space