1 / 17

A First-step Towards an Architecture Tuning Methodology for Low Power

A First-step Towards an Architecture Tuning Methodology for Low Power. Roman Lysecky Department of IP Management Conexant Newport Beach. Greg Stitt, Frank Vahid*, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

ghoward
Download Presentation

A First-step Towards an Architecture Tuning Methodology for Low Power

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A First-step Towards an Architecture Tuning Methodology for Low Power Roman Lysecky Department of IP Management Conexant Newport Beach Greg Stitt, Frank Vahid*, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside *also with the Center for Embedded Computer Systems, UC Irvine This work was supported by the National Science Foundation under grants CCR-9811164 and CCR-9876006, and by a Design Automation Conference graduate scholarship.

  2. Board Processor Memory Peripheral Core library Peripheral Mem PeripheralA Processor PeripheralB IP cores ProcessorX Introduction: advent of cores • In the past, board-level embedded systems were built using discrete IC’s • Today, single-IC systems are increasingly being built, using IP’s (Intellectual Property) • A.k.a. “cores” • Hard core: layout • Firm core: structure (HDL) • Soft core: synthesizable behavior (HDL) • “System-on-a-chip” (SOC)

  3. Introduction: embedded systems • SOC’s implementing an embedded system have a unique feature • Implements a particular application • Thus, the processor may execute a single fixed program that never changes • Unlike desktop systems, which execute a variety of programs • Examples: digital camera, automobile cruise-controller • We can exploit this fixed-program feature • For example, by using mask-programmed ROM • But much more can be done

  4. Core library Fixed program PeripheralA Architecture design PeripheralB ProcessorX Peripheral Prog. Processor Architecture tuning HDL Peripheral Prog. Processor Fabrication HDL Peripheral Prog. Tuned cores Processor IC Introduction: architecture tuning • Architecture tuning • A way to exploit the fixed-program feature of embedded systems • First, do architecture design for the particular application • Then, “tune” the core-based system architecture to the particular application program, before IC fabrication • Goals: better performance, power, size

  5. Introduction: architecture tuning • Examples of tuning optimizations • Memory hierarchy: no cache, L1 cache, L1+L2 cache • Cache organization: size, associativity, line size • Bus structure, data/address encoding • Microprocessor optimizations • Internal small-loop table • Controller partitioning • Datapath shortcuts • Register file copies

  6. Our focus Introduction: Tuning is a special case of Y-Chart iteration • Philips/TriMedia approach of simultaneously developing architecture and its applications Architecture Applications Mapping Analysis Numbers

  7. Problem description • Focus of this work: • Tuning a microcontroller to its program • Goal is reduced power without performance loss • Restrict tuning to maintain exact instruction set compatibility • No instructions may be added or deleted • Thus, no modification to software development environment • Also, no problems with porting software to/from other versions of the microcontroller • Instruction set incompatibility can be a show stopper

  8. Previous work • Application-specific instruction-set processors [Fisher99] • Customize a microprocessor to its application(s) • e.g., Tensilica • Customized instruction-set, requiring customized tools • Tuning compiler to architecture [Tiwari et al 94] • Architectural description languages to inform compiler of architecture features [Halambi et al 99] • Tuning cache and cache/bus [Givargis et al 99] organization to application

  9. Tuning environment • Currently for the 8051 microcontroller • Starts from VHDL synthesizable model of 8051 (soft core) • Uses Synopsys synthesis, simulation and power analysis • Uses 8051 instruction-set simulator • Uses numerous scripts • Goal of the enviroment • Understand how power is being consumed for a particular application, so that modifications to the architecture (or application) can be made to minimize that power • Three main tools • Architectural view • Instruction-set view • Program/data memory view

  10. Microprocessor soft core Program binary RT-synthesizer ROM generator Microprocessor structure ROM entity Simulator and power analyzer “Flat” power data Structural hierarchical power data translator and xdu display ROM 1.04 mW ALU 1.62 mW Total 7.66 mW RAM 1.42 mW CTRL 2.69 mW DECODER 0.07 mW Tuning environment: architectural view tool

  11. Binaries to exercise instruction 1 Binaries to exer instruction 2 Binaries to exe instruction 3 ROM generator Microprocessor structure ROM entity Simulator and power analyzer Flat power data for instruction 1 Flat power data for instruction 2 Flat power data for instruction 3 Power data collector, structural power data translator, and xdu display Tuning environment: instruction-set view tool Instruction Power (mW) ADDC_1 7.340834 ADD_1 7.350741 ANL_1 6.631394 CLR_1 3.76228 CPL_1 5.481627 DA 5.28897 DEC_1 5.368807 DIV 7.716592 INC_1 4.662862 MOVC_1 6.078014 MOVC_2 5.021021 MOV_1 5.577664 MOV_2 6.164267 MUL 5.522886 NOP 4.900275 ORL_1 6.954121 POP 8.103867 PUSH 8.7116

  12. Per-instruction power data Program binary Instruction-set simulator Program/data memory access frequencies and power Program hierarchy power translator and xdu display Tuning environment: program/data memory view tool Addr Ins Freq Pwr Freq*Pwr 00000 LJMP 1 0 0 00003 MOV_9 108 5.46067 589.752 00005 MOV_9 108 5.46067 589.752 00007 MOV_9 108 5.46067 589.752 00009 MOV_9 108 5.46067 589.752 00011 RET 108 0 0 00012 MOV_9 27 5.46067 147.438 00014 MOV_9 27 5.46067 147.438 00016 MOV_9 27 5.46067 147.438 00018 MOV_9 27 5.46067 147.438 00020 MOV_4 27 4.83507 130.547 00022 LCALL 27 0 0 Addr Purpose Accesses 00128 P0 1311 00129 SP 70317 00130 DPL 31189 00131 DPH 7977 00144 P1 161 00208 PSW 413527 00224 ACC 360949 00240 B 2598

  13. Program binary Microprocessor core Instruction-set power view tool (1 day) Program/data memory view tool (seconds) Architectural view tool (1 hour) Instruction-set power data Program power data Architecture power data Tuning environment

  14. Change application Change architecture Run program / data memory view tool Run architecture view tool Run instruction-set view tool No Satisfied? Yes DONE Design flow using the tuning environment

  15. ROM 1.04 mW ALU 1.62 mW Total 7.66 mW RAM 1.42 mW CTRL 2.69 mW DECODER 0.07 mW Sample tuning optimization • Observation • RAM consumes much power • Address 224 accessed frequently • Possible tuning optimization • Replace this RAM location by a register inside the CTRL module • Steps • Modify VHDL model • Run all three view tools • Results • Power reduction: 7.67 to 7.27 mW Addr Purpose Accesses 00128 P0 1311 00129 SP 70317 00130 DPL 31189 00131 DPH 7977 00144 P1 161 00208 PSW 413527 00224 ACC 360949 00240 B 2598

  16. Some recent data • Applied the tuning environment for a particular application • Converted two frequently-accessed RAM locations to registers • 15% total power savings • Introduced datapath shortcuts for the two most common register-to-register moves of the application, thus bypassing the ALU • 10% total power savings • Partitioned the controller into two, one small one implementing the frequently-executed instructions • 10-15% power savings, but we expect much more if we do a better job partitioning the design

  17. Conclusions • Described an environment for tuning a microprocessor to its application for low power • Full instruction set compatibility • Multiple views helps find power hogs • Fully automated • Focus is now on developing tuning optimizations • Controller partitioning, small-loop table, datapath shortcuts, register-file copies, etc. • Investigate possibility of automating tuning optimizations, develop more general tuning methodology • Environment for the 8051 is available on the web: • http://www.cs.ucr.edu/~dalton

More Related