1 / 21

Power estimation in the algorithmic and register-transfer level

Power estimation in the algorithmic and register-transfer level. September 25, 2006 Chong-Min Kyung. Software power analysis. Objective ; Compare different programs Select processors Optimize software Three level of granularity, (acc. to execution speed, availability & accuracy)

sabina
Download Presentation

Power estimation in the algorithmic and register-transfer level

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

  2. Software power analysis • Objective ; • Compare different programs • Select processors • Optimize software • Three level of granularity, (acc. to execution speed, availability & accuracy) • Source code level • Instruction level • BFM (Bus Function Model) level

  3. Execution performed on • 1) Target processor ; Compile source code & run • measure the heat generated to estimate the power? • Or monitor (with inserted monitoring instructions, or some hardware, both with hopefully negligible overhead/disturbance on the power and speed) to count the occurrence of each instruction and compute the total estimated power? • Dynamic code can be also handled. • Minimal disturbance of the overhead code is the key to accuracy

  4. Execution performed on • 2) Another processor ; Run a program estimating the power consumption with the target-compiled code as input data. • Only the power consumption of the static code can be estimated. • 3) Simulator ; • Either in source code level, • Or instruction code level (same as ‘Another Processor’)

  5. Power estimation of Software • Simplest approach ; Energy consumption is proportional to the program execution time. • Instruction set approach ; Energy consumption is different for each instruction class (class of similar power behavior), and each class of instruction pair (inter-instruction dependency). • Measurement done by running long loops of the same instruction

  6. Power estimation of Software • Becomes more difficult with more complex processor (multi-thread, out-of-order execution,…) and memory system architecture (cache..) • Accurate estimation requires software profiling on ISS with bus access pattern. • A 5% accurate estimation model developed for ARM processor [DAC 99, Simunic, T;Cycle-accurate simulation of energy consumption in embedded systems]

  7. Algorithmic-level power estimation • Algorithmic-level power estimation consists of • Architecture estimation • Activation estimation • Power model evaluation • Architecture estimation by High-Level Synthesis (HLS) • Allocation, Scheduling, and Binding (Allocation in narrow sense is ‘unit selection’, where each operation can be performed by more than one unit.) • Allocation and Scheduling affect each other. • HLS considering communication (interconnect) • ASB + floorplanning • Cycle time violation check based on wire delay (based on wire length estimation) • (HLS considering interconnect) and power

  8. Target architecture of HLS • Target architecture of HLS • Datapath <- dataflow of CDFG • Controller <- dataflow and control flow • Clock tree

  9. Target architecture of HLS • Architecture synthesis = • Schedule the operations under timing & resource constraints, and • Allocate the required resources (operation units) • Operation unit can be arithmetic module, logic module or memory module. • Output of architecture synthesis is • A set of operation units • Registers • Steering logic to transfer data between operation units and registers, and • Controller having control signals to steer MUX, OU and Enable signal of registers • How to integrate power optimization into HLS?

  10. RTL Power Modeling • RTL Power Modeling = Constructing a model Power=P(X1,X2,…Xn) from n model parameters

  11. Issues of RTL power modeling • Granularity ; • Choice of model parameters ; • Activity model or complexity model or both? • Semantic of the model ; • cumulative or cycle-accurate? • How to build and store the model ; • Top-down or bottom-up? • Table or equation?

  12. Model granularity • Model granularity ; • Should not be too big; • E.g., single monolithic model is too time-consuming to build, inaccurate, and inflexible • Not too small; • FSMD (FSM with datapath) is a reasonable choice, as RTL design is an interaction of datapath and controller • Five main components ; • Controller • Register file • Bus • Memory • Functional blocks

  13. Activity model or Complexity model, or both? • Model Parameters ; • What parameters are to be included in the model? • Model parameters must be observable at the RTL • P total = k AiCi ; Power model decoupled into two separate models, i.e., activity model and capacitance model • Activity model or Complexity model, or both? • Complexity model can be just capacitance model or include transistor count as well to account for the leakage current.

  14. Activity parameters • RTL activity : an approximation of all intra-clock cycle activities projected to the relevant clock transition point. • Main parameters are static and transition probabilities • Choose between bit-wise and word-wise probability according to the desired accuracy and speed • n-input, m-output component has (n+m) bitwise parameters, while has only two word-wise parameters • Additional parameters; • Transition density ; average switching rate per second • Includes non-periodic signals • Correlation measures ; useful for computing switching power • Spatial correlation • Temporal correlation = transition probability • Entropy ; somewhat similar to transition probability (2p(1-p) • plog2(1/p)+(1-p)log2(1/(1-p))

  15. Complexity parameters • Capacitance ~ gate count, TR count,. • Only complexity parameters available at RTL are • Width of a component ; # of inputs, outputs • # of states ; applicable for controller • Architecture-specific model • k12N2 for NxN multiplier • k2N for ripple carry adder

  16. Model semantics • Cumulative (average) vs. cycle-accurate ; • Cumulative power = summation of average (cumulative) power over module • Cycle-accurate power = summation of power over module for each clock cycle • Cumulative power is only as good as tracking battery time, average heat dissipation, etc. • Cycle-accurate power is needed for IP drop, noise, reliability (electromigration) analysis. • Pseudo-cycle-accurate power estimation may be okay for dynamic power management.

  17. How to build and store the model • Model construction • Top-down ; good for • When the implementation follows some predictable template, e.g., memory • When dealing with a new circuit having no measured data available • Bottom-up ; • Can be equation-based • Template for the power model is given first, • Statistical techniques are used to fit the measured values to the model by adjusting cofficients • Model storage • Equation-based • Table-based

  18. Accuracy issue • Metric ; E = lPe-Pl/max(Pe,P) • Average error • Standard deviation

  19. Macro modeling flow • Choose model parameters - Ex) Average switching activity of inputs and/or outputs • Design training set • Good coverage, unbiasedness, resembling actual circumstantial conditions • Characterization • Running the power-accurate lower-level simulator • For example, for RTL training, run a gate-level simulator with good coverage of input/output switching activities • Model extraction • For Equation-based, run LMS regression engine • For table-based, merge entries according to the available table space

More Related