220 likes | 346 Views
L27:Lower Power Algorithm for Multimedia Systems. 1999. 8 성균관대학교 조 준 동 http://vada.skku.ac.kr . Contents. Algorithmic Effects on Low Power Low Power Management Low Power Applications Low Power Video Processor Single Chip Video Camera Vector Quantization Data Encoding CDMA Searcher
E N D
L27:Lower Power Algorithmfor Multimedia Systems 1999. 8 성균관대학교 조 준 동 http://vada.skku.ac.kr
Contents • Algorithmic Effects on Low Power • Low Power Management • Low Power Applications • Low Power Video Processor • Single Chip Video Camera • Vector Quantization • Data Encoding • CDMA Searcher • Viterbi Decoder
Algorithm Selection • Example: 8x8 matrix DCT
Strength Reduction: DIGLOG multiplier 1st Iter 2nd Iter 3rd Iter Worst-case error -25% -6% -1.6% Prob. of Error<1% 10% 70% 99.8% With an 8 by 8 multiplier, the exact result can be obtained at a maximum of seven iteration steps (worst case)
Logarithmic Number System --> Significant Strength Reduction
Switching Activity Reduction (a) Average activity in a multiplier as a function of the constant value (b) A parallel and serial implementations of an adder tree.
System management, System partitioning, Algorithm selection Precompute physical capacitance of Interconnect and switching activity (number of bus accesses) Regularity: to minimize the power in the control hardware and the interconnection network. Modularity: to exploit data locality through distributed processing units, memories and control. Spatial locality: an algorithm can be partitioned into natural clusters based on connectivity Temporal locality:average lifetimes of variables (less temporal storage, probability of future accesses referenced in the recent past). Few memory references: since references to memories are expensive in terms of power. System-Level Solutions
System-Level Solutions - cont. • Simulator: Instruction-level Energy Estimation • Software: Energy Efficient Algorithms • OS: Voltage Scheduling Algorithms • OS: Multiprocessing for Energy • Microprocessor: Dynamic Caches
Processor Systems:high Power • Thinkpad (Pentium) ® 0.3 Hours/AA • InfoPad (ARM) ® 0.8 Hours/AA • Toshiba Portable (486) ® 0.9 Hours/AA • Newton (ARM) ® 2.0 Hours/AA Operations per Battery Life: Minimize Energy Consumed per Operation Operations per Second: Maximize Throughput º Operations/ second
DPM (Dynamic Power Management): stops the clock switching of a specific unit generated by clock generators. SPM (Static Power Management): When the system remains idle for a significant period time, then it is shut-down. DPM vs SPM Identify power hungry modules and look for opportunities to reduce power
Vdd vs Delay • Use Variable Voltage Scaling or Scheduling for Real-time Processing • Use architecture optimization to compensate for slower operation, e.g., Parallel Processing and Pipelining for concurrent increasing and critical path reducing. • Scale down device sizes to compensate for delay (Interconnects do not scale proportionately and can become dominant)
Power PC 603 Strategy • Baseline: use right supply and right frequency to each part of the system If one has to wait on the occurence of some input, only a small circuit could wait and wake-up the main circuit when the input occurs. • PowerPC 603 is a 2-issue (2 instructions read at a time) with 5 parallel • Execution units. 4 modes: • Full on mode for full speed • Doze mode in which the execution units are not running • Nap mode which also stops the bus clocking and the Sleep mode which stops the clock generator • Sleep mode which stops the clock generator with or without the PLL (20-100mW).
TI Structures • Two DSPs: TMS320C541, TMS320C542 reduce power and chip count and system cost for wireless communication applications • C54X DSPs, 2.7V, 5V, Low-Power Enhanced Architecture DSP (LEAD) family: Three different power down modes, these devices are well-suited for wireless communications products such as digital cellular phones, personal digital assistants, and wireless modem,low power on voice coding and decoding • The TMS320LC548 features: • 15-ns (66 MIPS) or 20-ns (50 MIPS) instruction cycle times • 3.0- and 3.3-V operation • 32K 16-bit words of RAM and 2K 16-bit words of boot ROM on-chip • Integrated Viterbi accelerator that reduces Viterbi butterfly update in four instruction cycles for GSM channel decoding • Powerful single-cycle instructions (dual operand, parallel instructions, conditional instructions)
InfoPad InfoPad Architecture, UC-Berkeley Internet Wireless Basestation “PadServer” SpeechRecognizer WebBrowser Transmit audio and raw bitmaps across the wireless link Example:Hand-heldspeech-enabled web-browser Maintain state in the network, not on the Pad Perform all computation in the network to minimize client energy dissipation
Main data-flow handled by custom low-power ASICs PacketHeader Frame- buffer update Control Statistics Reliability Debugging 10 MIPSμProcessor RX Packet Frame Buffer InfoPad Hardware Flexibility Embedded software responsible for high-level functions Only header sentto microprocessor Radio Entire packet routed to dedicated hardware • Use hardware/software integration toprovide energy-efficient high-level functionality
Intercom InfoPad InfoPad Evolution Total Power: ~7 W Where did the power go? Inefficientimplementation Energy-EfficientProcessors Commercial DC/DC No local computation? Commercial radios • High-level system design optimizes complete solution and drives new research