370 likes | 482 Views
A Decompression Architecture for Low Power Embedded Systems. Yi-hsin Tseng Date : 11/06/2007. Lekatsas, H.; Henkel, J.; Wolf, W.; Computer Design, 2000. Proceedings. 2000 International Conference on 2000 IEEE. Outline. Introduction & motivation Code Compression Architecture
E N D
A Decompression Architecturefor Low Power Embedded Systems Yi-hsin Tseng Date:11/06/2007 Lekatsas, H.; Henkel, J.; Wolf, W.; Computer Design, 2000. Proceedings. 2000 International Conference on 2000 IEEE
Outline • Introduction & motivation • Code Compression Architecture • Decompression Engine Design • Experimental results • Conclusion & Contributions of the paper • Our project • Relate to CSE520 • Q & A
For Embedded system • More complicated architecture in embedded system nowadays. • Available memory space is smaller. • A reduced executable program can also indirectly affect the chip on… • Size • Weight • Power consumption
Why code compression/decompression? • Compress the instruction segment of the executable running on the embedded system… • Reducing the memory requirements and bus transaction overheads • Compression Decompression
Related work on compressed instructions • A logarithmic-based compression scheme where 32-bit instructions map to fixed but smaller width compressed instructions. • (The system using memory area only) • Frequently appearing instructions are compressed to 8 bits. • (fixed-length 8 or 32 bits)
The compressed method in this paper • Give comprehensive results for the whole system including • buses • memories (main memory and cache) • decompression unit • CPU
Architecture in this system (Post-cache) • Reason ? • -Increase the effective cache size • Improve instruction bandwidth
Code Compression Architecture • Use SAMC to compress instructions • (Semiadaptive Markov Compression) • Divide instructions into 4 groups • based on SPARC architecture • appended a short code (3-bit) in the beginning of each compressed instruction
4 Groups of Instructions • Group 1 • instructions with immediates • Ex: sub %i1, 2, %g3 ; set 5000, %g2 • Group 2 • branch instructions • Ex: be, bne, bl, bg, ... • Group 3 • instructions with no immediates • Ex: add %o1,%o2,%g3 ; st %g1,[%o2] • Group 4 • Instructions that are left uncompressed
The Key idea is…. • Present an architecture for embedded systems that decompresses offline-compressed instructions during runtime • to reduce the power consumption • a performance improvement (in most cases)
Pipelined Design – group 1 (stage 1) Index the Dec. Table Input Compressed Instructions Forward instructions
Pipelined Design – group 3instructions with no immediates (stage 1) No immediate instructions may appear in pairs. -> compressed in one byte. (<-> 64 bits) 256 entry table 8 bits as index to address
Pipelined Design – group 3instructions with no immediates (stage 2)
Pipelined Design – group 3instructions with no immediates (stage 3)
Pipelined Design – group 3instructions with no immediates (stage 4)
Experimental results • Use different applications: • an algorithm for computing 3D vectors for a motion picture ("i3d“) • a complete MPEGII encoder ("mpeg ") • a smoothing algorithm for digital images ("smo") • a trick animation algorithm ("trick") • A simulation tool written in C for obtaining performance data for the decompression engine
Experimental results (con’t) • The decompression engine is application specific. • for each application -- build a decoding table and a fast dictionary table that will decompress that particular application only.
Worse performance on smo 512-byte instruction cache? -Do not require large memory. (Execute in tight loops) - Generates very few misses for this cache size. (So the compressed architecture therefore does not help an alreadyalmost perfect hit ratio and the slowdown by the decompression engine prevails)
Conclusion & Contributions of the paper • This paper designed an instruction decompression engine as a soft IP core for low power embedded systems. • Applications run faster as opposed to systems with no code compression (due to improved cache performance). • Lower power consumption (due to smaller memory requirements for the executable program and smaller number of memory accesses)
Relate to CSE520 • Implement the system performance and power consumption by using Pipeline Architecture in system. • A different architecture design for lower power consumption on the Embedded system. • Smaller cache size perform better on compressed architecture ; larger cache perform better on no-compressed architecture. • Cache hit ratio
Our project • Goal: • How to improve the efficiency of power management in embedded multicore system • Idea: • Use different power mode within a given power budget, global power management policy (In Jun Shen’s presentation) • Use the SAMC algorithm and this decompress architecture as another factor to simulate (This paper) • How? • SimpleScalar tool set • try simple function at first, then try the different power mode