370 likes | 381 Views
Profiles in Power: Optimizing Real-Time Systems for Power As well as Speed (IPS), Response Latency and Cost. Graham Hellestrand Mahdi Seddighnazhad James Brogan VaST Systems Technology Corporation. Wireless Trends. Key Focus: Low Cost, Power Reduction and Increased Features
E N D
Profiles in Power: Optimizing Real-Time Systems for PowerAs well as Speed (IPS), Response Latency and Cost Graham Hellestrand Mahdi Seddighnazhad James Brogan VaST Systems Technology Corporation
Wireless Trends • Key Focus: Low Cost, Power Reduction and Increased Features • Competitive positionsmust be maintained • Product complexity isincreasing • Hardware growth • Software growth • Critical Program Schedules • Market windows must be hit • Revenue opportunitiesmust be captured • Burden has moved to designand development CONFIDENTIAL
The Metric Power Reducing in power regardless of the effect on other optimization factors is of limited value. • Example: • Saving 50% power • While Yielding: • 50% speed hit and/or • Failure to meet response latency specifications Is likely to be a unacceptable in the marketplace CONFIDENTIAL
Implications • Real-time software architecture and development needs to be subject to a rigorous optimization of an appropriate objective function, based on: • Power • Speed • Event response latencies • Examples: interrupts, exceptions • Cost – approximated by: • Cache sizes • Memory sizes and hierarchies CONFIDENTIAL
System Architecture & OptimizationSoftware ArchitecturePlatform ArchitectureReal-world interaction architectureProcessor µ-architecture+Empirical experimentation
Architecture Addresses the Whole System Buses & Bridges Devices VPMs & Peripheral Devices Structures Architecture RF, Mechanical, Physical Virtual Prototype Sub- systems Evaluation, Exploration Systems Platform Appli- cations Behav. Middleware, Comms Software Hardware RTL Operating Systems Device Drivers Physical CONFIDENTIAL
Optimization effect:Software Architecture & Design1st Order Effect on system performance
Architecture VSP Hardware Software Software Architecture & Design UML, Simulink, C, C++, … Create Compile Assemble • Monitor prototype internals • Cache hits/misses • Bus transactions • Processor performance • Memory usage • Interrupt latency • Trigger hardware and software debuggers • Example usage: analyze processor and platform power • Make intelligent tradeoffs between power, performance and cost Link HW Load VaST VSP Debug + Monitor SW IDE CONFIDENTIAL
Optimization effect:Platform Architecture & Design1st Order Effect on system
D ROM P ROM StarCore SC1400 Virtual Processor Model ARM1176 P1 Virtual Processor Model ARM1156 P2 Virtual Processor Model I Cache D Cache StdBus I/F StdBus I/F A H B Buses I Cache D Cache I Cache D Cache StdBus I/F StdBus I/F StdBus I/F StdBus I/F StdBus Bridge StdBus Bridge StdBus Bridge StdBus Bridge Arb. Ctrl DRAM Console 1 Console 2 Memory Block Memory Block UART UART Shared Memory P1 Memory P2 Memory TIMER TIMER INTC INTC Memory Block Memory Block P1 Devices P2 Devices Typical 3G Cell Phone Controller3 processors, 12 buses, 10 bus bridges, 70 peripherals VaST Virtual System Prototype (model) CONFIDENTIAL
Optimization effect:Real-world Interaction Architecture1stOrder Effect on system
Engine control unit Real-time Engine Monitoring AutomotivePower-train Control Igniting fuel under pressure at the wrong part of the cylinder stroke Results in spectacular destruction of the engine (and maybe the experimenter) CONFIDENTIAL
Optimization of:Processor µ-architecture2nd / 3rdOrder Effect(apart from caches & buffering)
Generic Single Pipeline Operation CONFIDENTIAL
Business Requirements Software Functional Requirements Translate Architectand Test Designand Test Developand Test CoMET System Level Design Tool METeor Executable System Specification + Virtual System Platform Integrate & CoVerify Silicon Hardware Platform +Embedded System Software + Integrate & CoVerify VSP Executable System Architecture (VSP) CoMET Hardware Translate Architectand Test Designand Test Developand Test Integrated & Optimized Final Product Concurrent, Iterative S/W – H/W Development Architecture + + System Development Process CONFIDENTIAL
System architecture Virtual Prototype (timing accurate) + Software || Hardware design Virtual System Prototypes (high speed) Electronic System Design Process Evaluate architectures of candidate designs using real software applications Architecture Virtual Prototype Hardware development Software development Develop behavioral-level executable specification and verify RTL Design, develop and debug software before silicon or hardware prototypes are available CONFIDENTIAL
So What Performance can we get from a Timing Accurate VSPon a Single Processor Host?That is how useful are these things?
ARM926E VPM 1 ARM926E VPM 1 ARM926E VPM 1 CONFIG & CONTROL CONFIG & CONTROL CONFIG & CONTROL INST INST INST DATA DATA DATA GP INTC ARM GP INTC ARM GP INTC ARM Bridge Bridge Bridge Bridge Bridge Bridge Bridge Bridge Bridge GP MEM GP MEM GP MEM GP TIMER GP TIMER GP TIMER GP UART GP UART GP UART GP MEM GP CONSOLE GP MEM GP CONSOLE GP MEM GP CONSOLE VSP Computation PerformanceMultiple Independent Platforms CONFIDENTIAL
Results - Computational Performance Study Platform dominated study: As Virtual System Prototypes (VSPs), with the processors having software and data resident in cache, are switched into the simulation (Pink line), the sharing of host cycles between the processor and the hardware (purple line) of each VSP stays in proportion for each additional VSP activated. The frequent switching between VSPs, each having a processor and hardware that also share the host cycles, also increases the Simulation overhead (blue line). CONFIDENTIAL
Application software (Vocoder), on INT will shuffle data from DRAM to MemBanks Application software (Viterbi), on INT will shuffle data from DRAM to MemBanks SC1200 SC1200 DMA Master Core Master DMA Master Core Master Slave OCP Channel Wrapper Slave DRAM (2MB) AHB AHB DMA Traffic Generator approx. 60% utilization DMA Traffic Generator DMA Traffic Generator 32 32 32 DMA Traffic Generator Bridges Bridges Bridges Bridges Bridges Bridges 32 32 32 32 32 32 every 300-500 cycles AHB like transactions Mem Bank 0 (512KB) Mem Bank 1 (512KB) Mem Bank 2 (512KB) Mem Bank 3 (512KB) Mem Bank 4 (512KB) Mem Bank 5 (512KB) VSP with TLM Bus Matrix CONFIDENTIAL
Results – Bus Matrix Performance Communications and computation sharing study: This is a multi-variable study measuring simulation performance of a system having transactions of various sizes (1024, 64 and 4 bytes) being transmitted at a high rate over a complex switch to which are attached two SC1200 processors. Initially no processors are activated and each is then successively activated. The bar chart is best read as a sequence of 3 pairs (Transaction / Headroom (MIPS) – into the slide. As transactions become progressively smaller, there is relatively more work to be performed by the model to transmit and receive them. The Headroom measure is the amount of available host cycles for further simulation. As more processor are activated and the transaction size is reduced, the available headroom diminishes. CONFIDENTIAL
Study 4: VSP Interrupt HandlingAutomotive Benchmark, Feb 2004 Capability or a VSP under interrupt loads: This is a relatively simple experiment that shows the performance of a single processor Virtual System Prototype under increasingly stressful rates of processing asynchronous events (interrupts). Even at high interrupt rates (every 3,750 cycles is equivalent to a 12 cylinder engine running at 20,000 RPM and producing an interrupt every 10 degrees of crank-angle) the VPM is capable of simulating high software execution rates (4 MIPS) while handling the interrupts. CONFIDENTIAL
It is all about optimization, stupid! 32-bit MPU Clock Gen. Serial Comms Interrupt Controller A2D Convert RAM ROM General I/O Bus Interface DMA Virtual bus InterruptTimer Flash Virtual Prototypes Physical Mechanical, RF, .. Physical Prototype Specifications H-type Respecifier Very Smart System Instantiator Software Power Consumption Asynch-Signal Response Latency Speed CONFIDENTIAL
Typical 2.5G Wireless Systemsbuilt using aVirtual System Prototype
ARM Debugger TeakLite Debugger SG2 Virtual COM Port I Q Signals Virtual PrototypingMobile Handset Development Full System Development Architecture, Software, Hardware, I/F CONFIDENTIAL
ARM Debugger TeakLite Debugger SGOLD2 Architecture Keypad Test Bench Linux OS Execution + MPEG4 Encoding Camera Input Camera Test Bench Win32 Terminal for all Serial IOVirtual COM Ports LCD Display QCIF/CIF Wireless VP Benefits • Early Design Feedback in Semiconductor Development Process • Enabled 1st Pass Silicon Success • Eliminated Costly 2nd Silicon • Provided Complete SoftwareDevelopment Environment 9 Months Prior to Silicon • Resulted in a Better QualityProduct 5 Months EarlierThan Standard DevelopmentProcess • Advanced Debugging • Multi-Core debugging • ARM926 (ADS 1.2) • TeakLite* (DSP group) • Complete system visibility • S-GOLD programmer model • Bus status & Interrupt behavior • System cycle count, monitors • I/O Test Bench Support • Open Model Extension CONFIDENTIAL
Concurrent Bus Activity CONFIDENTIAL
Optimizing forPower and Performance Separated Functions
General Form of Multi-Objective Optimization Equation:Characterize an objective function in terms of events directly measurable from the VSP Problem: Huge volume of data some of which may be highly correlated with other data – leading to multiple counting and unreliability in composite measures. CONFIDENTIAL
A Simple Power Function for a Full Platform CONFIDENTIAL
Resolving the Weights for the Power Function CONFIDENTIAL
Single Task Working Set vs Cache Size Analysis CONFIDENTIAL
Linux Boot - Memory Hierarchy Analysis(I&D cache + bus + bus bridge + Mem (DDR | SDR) Analysis CONFIDENTIAL
Replace Cache with Simple External Buffer for a Known Task Set CONFIDENTIAL
The Message • System optimization needs a composite, complex optimization function of functions operating on a complete (model of a) system. The constituent functions include: • Power • Speed • Response deadline compliance • Cost • …… A rigorous scientific methodology is required for empirical experimentation CONFIDENTIAL