630 likes | 1.24k Views
System-level Power Estimation and Optimization. 2006.09.03 Chong-Min Kyung KAIST. Contents. Introduction System-level Power Estimation System-level Power Optimization. Introduction. Power classification Static power ≈ leakage power Dynamic power Switching power Short-circuit power
E N D
System-level Power Estimationand Optimization 2006.09.03 Chong-Min Kyung KAIST
Contents • Introduction • System-level Power Estimation • System-level Power Optimization
Introduction • Power classification • Static power ≈ leakage power • Dynamic power • Switching power • Short-circuit power • Glitch power
Introduction • Power calculation • Static Power • Ptotal_leakage = ∑ Pcell_leakage • Dynamic Power • Pinternal = Func (Cload,TRinput,output) • TR: toggle rate • Pglitch = V2dd ∑(Cloadnet * fglitch * τ) • fglitch: the frequency of glitch • τ: the factor of the width of glitch • Pswitching = ½V2dd ∑(Cload * TRoutput) • Ratio • > 0.1um: switching power: 70~90% • < 0.07um: leakage power: > 50% • Data intensive application • Switching power is a dominant factor.
Introduction • Opportunities for power reduction • For low power design 1. Power model generation 2. Power estimation 3. Power optimization Academic research Commercial tool
Contents • Power Model Generation • Analytical Method • Empirical Method • System-level Power Estimation • Hardware Power Estimation • Software Power Estimation • Bus Power Estimation
Power Model Generation 1. Analytical method • Use average values of design parameters without different circuit styles, clock strategies and layout techniques consideration • Average capacity, equivalent gate count, primary input number, etc. • Mainly used for behavior-level power estimation • when there is no information about technology library and implementation information • Very low accuracy 2. Empirical method • Use the parameters measured by existing implementations 2-1. Fixed-activity model 2-2. Activity-sensitive model
Index Switch Capacitance (pF) Previous input vector Current input vector 01 … 0n 01 … 0n Cap0 01 … 0n 01 … 1n Cap1 … … … 11 … 1n 11 … 11n Cap2n-1 Power Model Generation 2-1. Fixed-activity model • Use data sheet of a specific hardware block • Pprocessor = Cprocessor x VDD2 x freq • Cprocessor = Pdata_sheet / (Vdata_sheet2 freqdata_sheet) • Low accuracy • Mainly used for coarse-grained system-level power estimation 2-2. Activity-sensitive model • Use signal activity or its statistics which depends on testbench • Transition-sensitive model • Power model is a Look-Up Table (LUT). • Very high accuracy • Statistical activity model • Power model is a LUT or an equation. • High accuracy
Macro Modeling Method • Macro modeling method • Raise abstraction of power model by characterizing macro cell • Mainly used to reduce power model complexity in activity-sensitive power model generation • Macro cell • 32-bit adder, multiplier, MUX, etc. • Reduced computation complexity at the cost of accuracy • Macro cell characterization • Synthesize macro cell with basic cell library • Estimate power value of macro cell with various testbench • Generate power model and reduce its complexity • This concept can be used for raising abstraction of power model in hardware or software-level power estimation.
Macro Modeling Method • Power model of macro modeling method • Statistical activity model • LUT-based model • For each bus component, build 3-D LUT (with axes of Pin, Din, Dout) • Fill power value at each point (Pin, Din, Dout) • Requires a lot of memory space • Equation-based model • Build a polynomial approximating power consumption. • From a large number of input patterns, perform analysis to determine the coefficients. • Requires little memory space • Pin: average input signal probability • Din : average input switching activity • Dout: average output zero delay switching activity
System-level Power Estimation • Estimation speed and power model • Trade-off between estimation speed and accuracy of power model • Abstraction of power estimation • System-level power estimation • Software-level power estimation • Hardware-level power estimation • Behavior-level, RT-level, gate-level, circuit-level Relative power results Absolute power results
System-level Power Estimation • System-level power estimation • Relative value of power consumption is important. • Objective • Power profiling and design exploration • System-level power estimation is composed of 1. Hardware power estimation 2. Software power estimation in processor 3. Bus power estimation
Hardware Power Estimation • RT-level power estimation • Dynamic simulation-based power estimation with coarse-grained net model from power macro model database and testbench
Hardware Power Estimation Tool • There are some commercial tools for hardware power estimation • RT-level • Synopsys Power CompilerTM • Gate-level • Synopsys Prime PowerTM, Synopsys Power CompilerTM • Circuit-level • SPICE, Synopsys PowerMillTM, Cadence VoltageStormTM
Software Power Estimation • Estimation • Processor is too complex to estimate in RT-level. • Power consumption is related to each instruction and instruction sequence. • Estimation method • Power model is added to ISS for instruction-level power profiling. Bi: energy consumption of inst. i Ni: number of execution of inst. i Oij: energy consumption when inst. i is followed by inst. j Nij: number of pair inst. i and inst. j Sk: other inst. Effect such as cache misses, pipeline stall, etc.
Software Power Estimation • Power model • Instruction-level power model • Inter-instruction effect consideration • Dynamic effect (cache miss, branch prediction, etc) • Power modeling method 1) White-box approach 2) Black-box approach
Accuracy Speed High Low Low High White-box Approach • Power model • Activity-sensitive model • Characterization • Use macro modeling method • Process • Run gate-level simulation • Find predominant parameter • Reduce power model complexity • Simple equation or reduced LUT • Make instruction-level power model • Accuracy is degraded and estimation speed is increased by reducing the power model complexity.
V : Oscilloscope : Ammeter principle ( r << R ) Black-box Approach • Characterization flow • Measurement • Characterization Measurement I(t) R V r V Characterization Instruction-level Power Model
Black-box Approach • Measurement • By current measurement of real chip • Power model • Activity-sensitive power model • Statistical activity model • Characterization process • Current is estimated using real chip with multiple iterations of subroutine • Compare measured value with ISS including dynamic effects • Find a power equation which is similar to the measured power graph • Decide coefficients of power equation by experimental iteration It is important to find the closest equation to the measurement results.
Pulse/Pattern Generator synchronization signal Digital Sampling Oscilloscope Interrupt signal current signal clock Target Chip under Measurement Black-box Approach • Measurement method • Program under measurements are isolated by using interrupt signal, NOP instruction and processor wait state for finding exact measurement position and for synchronization. R. Muresan and C. Gebotys, “Current dynamics-based macro-model for power simulation in a complex VLIW DSP processor”, IEE proc.-Comput. Digit. Tech., 2002
Index Switch Capacitance (pF) Previous input vector Current input vector 01 … 0n 01 … 0n Cap0 01 … 0n 01 … 1n Cap1 … … … 11 … 1n 11 … 11n Cap2n-1 Software Power Estimation Toolfor Research Purpose • SimplePower • Functional simulator • SimplePower core based on SimpleScalar ISA • Power model • Activity sensitive power model • Direct simulation and profiling based on input transitions • Generate switch capacitance tables Cycle-accurate activation information Implementation-based signal generation
Software Power Estimation Tool for Research Purpose • Wattch • Architecture-level power estimation • Functional simulator • SimpleScalar: cycle-level performance simulator • Power model • Fixed activity power model • Categories • Array structure • Fully associative CAM • Combinational logic and wires • Clocking logic • Example: Array structure • Power = C1 + C2 * A + C3 * B • A: Bit line number, B: Word line number • C1: Diffusion cap., C2: Gate cap., C3: Metal cap.
Bus Power Estimation • Power consumed on the bus consists of two parts • Bus component power • Power consumed internally in the bus components • Arbiter, decoder, muxes • Interconnection power • Power consumed on the bus wires that connect the master and slave interfaces and the bus components • Address bus, data bus, control signals
Bus Component Power Estimation • At System level, only the structural information about bus architecture can be obtained. • Bus interconnection • Bus width • Global bus power model is used for estimation • Characterized power model of bus component is in the global bus power model • Arbiter, decoder, multiplexer • Behavior, FSM Memory IP # 2 IP # 1 Processor Global Bus Power Model bus
Bus Component Characterization • Macro model • Pre-calculated power cubic • Useful to apply on system level power estimation. • Input parameter of the macro models • Data and address bus width, or the operating frequency • The number of masters and slaves • Input/output data characteristics • The switching activity, the probability of signal or the Hamming distance of two successive data
Arbiter Master #1 Slave #1 M U X Master #2 Slave #2 M U X Master #3 Slave #3 Decoder Bus Power Analysis • AMBA AHB bus power analysis • A standard for on-chip communication • Power analysis process • Bus structure decomposition • Arbiter • Decoder • Multiplexer • Build macro model of eachcomponent • Bus behavior decomposition and build power FSM • IDLE, READ, WRITE, and IDLE with handover • Monitor bus signal activity • Power analysis through power FSM Global bus power model
Interconnection Power Estimation • Power consumption on each wire • P = ½ Vdd2 ·C · f ·α • Vdd : voltage swing between the logic level 1 and 0. • C: capacitance of the wire. • f : clock frequency. • α : switching activity. • Vdd and f is given as fixed value. • We need to find C and α. • C can be obtained from wire capacitance model. • α can be obtained from system level simulation.
Interconnection Power Estimation • Wire capacitance model • * • εox : constant, 3.45 x 10-13F/cm, permittivity of SiO2 • xint : oxide thickness underneath the interconnect • W : interconnect width • L : interconnect length • W, xint can be obtained from the technology parameter. • L can be estimated from the area of the chip • (where A is area of the chip) * J. P. Uyemura, ‘Circuit Design for CMOS VLSI’ Kluwer Academic Publishers 1992.
Interconnection Power Estimation • Switching activity model • Switching activity can be obtained from bus transactions. • Bus model monitors bus transition and counts bus switching. CPU Bus model mem DSP IP Monitoring bus transition System level simulation
Bus Power Estimation • Power estimation • Application example is simulated in system level simulator. • Power estimator reports power consumption using the power model of the bus components and interconnection. • Monitored values in the bus transition are used as the input of the power estimator. CPU Bus model mem Power Estimator DSP IP System level simulator
Contents • Low Power System Implementation Techniques • Circuit level • Clock gating • MTCMOS • Multiple voltage supply • Architecture level • Memory Optimization • Bus Optimization • Dynamic Power Management in System Level • Introduction to DPM • Structure of DPM • Component-level DPM scheme • DPM Policy • Dynamic Voltage Scaling
Circuit Level Low Power System Implementation Techniques • Clock gating • Most popular method for power reduction of clock signals • Need circuit to generate enable signal • Increases complexity of control logic • Timing critical to avoid clock glitches at AND gate output • Additional gate delay on clock signal
Circuit Level Low Power System Implementation Techniques • MTCMOS • Low VTH devices in logic to maintain performance when active. • High VTH current switch (header or footer) to cutoff leakage path when sleep. • Scheduling algorithm which controls sleep signal is important. VDD header sleep Virtual VDD Logic Input Output Virtual GND sleep footer
Circuit Level Low Power System Implementation Techniques • Multiple Voltage Supply • Slows down non-critical path with lower voltage supply • Two or more power grids • Need high-efficiency voltage converters for dynamic voltage scaling • Dynamic power scheduling algorithm is important. In * + Critical path: need high speed logic Low voltage supply + High voltage supply - +
Architecture Level Low Power System Implementation Techniques • Memory Optimization • Code density optimization • Goal • Minimize program memory occupation to reduce the bandwidth of processor-memory communication • Approaches • Custom instruction sets • Object code compression
Memory Optimization • Custom instruction set • Shorter size instruction sets than regular instruction sets • Example : ARM Thumb code (16bit instruction) • Need a specific architecture for 16 bit instruction support Inst 5 Inst 4 Inst 4 Inst 5 Inst 3 Inst 2 Inst 3 Inst 2 In this case, 3/5 bandwidth reduction Inst 1 Inst 1 32bit 32bit
Memory Optimization • Object code compression • The size of all instructions is same, but some or all instructions are encoded and saved in instruction memory. • Available solution for embedded processors • A specific architecture for different type of instruction support is not needed. • Exploit the small subset of instructions used by firmware code • Approaches • Full code compression • Selective code compression
Memory Optimization • Full code compression • Replace all instructions with binary patterns of minimum width. • [log2 N], where N is the number of instructions • Advantage • Memory bandwidth for instruction is decreased. • Disadvantage • Size of IDT may be very large because N is not small. • log2 N may not be a multiple of 8. Memory Memory Addr. Core Core Addr. Inst. Inst. IDT log2N k k k bits log2N bits IDT : Instruction Decompression Table
Memory Optimization • Selective Code Compression • Almost program traces are covered by a small subset of instructions. • Compression only such subset – instructions that maximize program coverage • Program is a mix of compressed and uncompressed instructions. Memory Addr. Core Buffer k k Inst. IDT 8 8 bits Controller
Memory Optimization • Advantage • Size of IDT is fixed and limited. • Instruction fetching/decompression logic has reduced complexity. • Disadvantage • Requires a controller to handle instruction fetching
Memory Optimization • Data density optimization • Same principle as code density optimization • For the purpose of reducing memory traffic • dynamic size of the data-set • More complex than code compression, because both compression and decompression are required • Hardware compression/decompression unit needed • Design trade-off between speed and power
Architecture Level Low Power System Implementation Techniques • Bus power optimization • A large amount of power is dissipated in data communication over heavily-loaded on-chip or off-chip busses. • Reduce switching activity on busses via signal encoding for power saving • Approaches • Bus-invert coding • Gray code addressing PBus = n x C x Vdd2 x freq x activity , for an n-bit bus
Bus Optimization • Bus-invert coding • Add redundant line INV to bus • When INV = 0 • Data is equal to remaining bus lines • When INV = 1 • Data is complement of remaining bus lines • At each cycle decide whether sending the true or compliment signal leads to fewer toggles Source data Data bus Received data INV signal Polarity Decision logic
Bus Optimization • Gray code addressing • Most instruction addresses are consecutive • Use Gray code to address • Word-oriented machines • Increments by 4 (32 bit) or by 8 (64bit) • Modify Gray code to switch 1 bit per increment • Gray code adder needed for jump i : increment
Introduction to DPM • Dynamic Power Management (DPM) • DPM controls power consumption of components based on its usage. • Prediction of component usage is essential. • Methods • Shutdown (clock gating, power gating) • Slowdown (frequency scaling, voltage scaling, VTH scaling) f VDD f VDD idle VDD 0.6 VDD T/2 T
Structure of DPM • Levels of embodiments of DPM • Component level • Circuit, Block • Power mode • System level • Policy • The procedure which controls the power level of each module in a system System Policy power mode power mode request request Block 1 Block n … … Circuit Circuit … Circuit Circuit
Component Level DPM Scheme • Circuit level • Clock off by clock gating • Power off by footer/header of MTCMOS • Multiple voltage supply • Block level • Power off by shutdown of power supply to IPs • When power off pattern of two block are similar, shutdown together. Virtual VDD IP #1 Virtual GND VDD source IP #2 GND source
Power mode Each state has combination of enabled DPM technique. ex) The case that system uses clock gating and block shutdown Transitions between modes of operation have a cost. Component Level DPM Scheme P=400mW Run 90μs 10μs 10μs 160ms P=50mW P=0.16mW Sleep 90μs Idle Wait for interrupt Wait for wake-up event Power state machine for the StrongARM processor SA-100 Microprocessor Technical Reference Manual, Intel, 1998