110 likes | 124 Views
Instruction-based System-level Power Evaluation of System-on-a-chip Peripheral Cores. Joerg Henkel NEC C&C Research Princeton, New Jersey. Tony Givargis, Frank Vahid* Dept. of Computer Science & Engineering University of California, Riverside
E N D
Instruction-based System-level Power Evaluation of System-on-a-chip Peripheral Cores Joerg Henkel NEC C&C Research Princeton, New Jersey Tony Givargis, Frank Vahid* Dept. of Computer Science & Engineering University of California, Riverside *also with the Center for Embedded Computer Systems, UC Irvine This work was supported by the National Science Foundation under grant # CCR-9876006 , and by a Design Automation Conference graduate scholarship.
Core database Application1 Peripheral1 Peripheral2 Peripheral1 Peripheral2_a Peripheral2_b …. System-on-a-chip (SOC) • Want to explore alternative cores, parameter settings, and applications • Gate/RT level simulation too slow SOC Application2 Micro- processor Cache Memory Bridge
SOC: System-level model Application Cache Memory Micro- processor Cache Memory Bridge Bridge Peripheral Peripheral Peripheral Peripheral Peripheral Peripheral SOC: Gate-level model Application • Still need system-level method for peripherals • 3-step method Micro- processor Cache Memory Bridge Peripheral Peripheral Peripheral SOC System-level Power Estimation • Microprocessor • Tiwari/Malik/Wolfe 94 • Instruction set simulator • Marculescu/Pedram 96 • Instruction trace reduction Micro- processor • Plus cache, memory & bus • Simunic/Benini/DeMicheli 99 • Extended instruct. simulator • Givargis/Vahid/Henkel 99 • Trace reductions
Reset() … Enable_tx() … Enable_rx() … Send() … Rcceive() … UART UART Core Provider’s Step 1: Instruction-based System-Level Model Creation • System simulation model already commonly used, and required in VSIA standard • Executes ~1000x faster than gate-level model Core database UART JPEG decode ….
Energy 2 bytes 4 bytes 8 bytes 16 bytes Reset Reset 13 J 13 J 13 J 14 J 14 J Enable_tx Enable_tx 23 J 23 J 25 J 24 J 24 J Enable_rx Enable_rx 18 J 18 J 19 J 19 J 19 J Send Send 76 J 76 J 77 J 89 J 115 J Receive Receive 44 J 44 J 49 J 55 J 64 J Buffer size UART instruction UART instruction Instruction Core Provider’s Step 2: Low-level Per-instruction Power Evaluation • Measure power of gate/layout model, per instruction • Use unique testbench per instruction, may take hours/days • Low-level model differentiates cores from other SOC modules enabling accurate power estimation • Must account for core parameters
Energy Reset 13 J Enable_tx 23 J Enable_rx 18 J Send 76 J Receive 44 J Core Provider’s Step 3: Back Annotation of System Model Core database Reset() … uJtot += 13 Enable_tx() … uJtot += 23 Enable_rx() … uJtot += 18 Send() … uJtot += 76 Rcceive() … uJtot += 44 UART UART UART JPEG decode ….
2 bytes 4 bytes 8 bytes 16 bytes Mode 1: Idle Reset 11 J 13 J 14 J 14 J Enable_tx 27 J 32 J 31 J 31 J Enable_rx 17 J 18 J 19 J 18 J Send 17 J 19 J 19 J 20 J Receive 14 J 15 J 17 J 18 J Enable_tx or Enable_rx Mode 2 : Enabled Mode1: Idle Mode2: Enabled Reset 13 J 13 J 14 J 14 J Enable_tx 23 J 25 J 24 J 24 J Reset Enable_rx 18 J 19 J 19 J 19 J Send 76 J 77 J 89 J 115J Receive 44 J 49 J 55 J 64 J Core “Power Modes” Requires Extra Effort by Core Provider • Unlike microprocessor, certain peripheral core instructions can greatly modify power consumption of other instructions • Must create power mode transition function, and measure power per instruction per mode.
+ Total energy User Performs System Simulation, Which Yields Power Data • Simulation takes only seconds or minutes SOC Application Micro- processor Cache Memory Core database Bridge Peripheral Peripheral UART UART UART JPEG decode ….
14% 1793 1% Gate-level: 40,980 sec 1573 1550 “Databook” RT-level: 2,700 sec Instr.-based system-level: 14 sec 38% 717 5% 519 493 37% 2% 155 113 115 Results: Image-decode Accelerator • Examined 3 peripheral cores: UART, DMA, JPEG • Compared our instruction-based system-level method with: • Gate-level simulation: slow but accurate • “Databook” RT-level: cycle-accurate simulation, used databook average-power values 2000 1800 1600 1400 1200 1000 Energy (mJ) 800 600 400 200 0 UART DMA JPEG
Gate-level energy (mJ) System-level energy (mJ) Single-mode 113 86 23.0% Two-modes 104 8.6% Four-modes 115 1.7% Error Results: Importance of Power Modes • Proper power-mode selection is critical for peripheral cores • Too few modes or wrong modes can lead to much error UART example
Conclusions • Introduced instruction-based method is • Accurate (less than 5% error) • Fast (1000x speedup over gate-level) • Fits with current core-based methodology • Concept of power modes is necessary for accuracy • Future work includes: • Trace-simulator-based approach (10x speedup) • Trace-analysis-based approach (100x speedup)