1 / 42

Offline Compression for On-Chip RAM

John Regehr University of Utah Joint work with Nathan Cooprider. Offline Compression for On-Chip RAM. Microcontrollers (MCUs). $12.5 billion market in 2006 ~10 billion units / year MCU-based products typically highly sensitive to unit cost. On-Chip RAM is Small. Atmel AVR examples

miyo
Download Presentation

Offline Compression for On-Chip RAM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. John Regehr University of Utah Joint work with Nathan Cooprider Offline Compression for On-Chip RAM

  2. Microcontrollers (MCUs) • $12.5 billion market in 2006 • ~10 billion units / year • MCU-based products typically highly sensitive to unit cost

  3. On-Chip RAM is Small • Atmel AVR examples • mega48 – 0.5 KB RAM – $1.50 • mega128 – 4 KB RAM – $8.75 • mega256 – 8 KB RAM – $10.66 • SRAM can dominate power consumption of a sleeping chip • On-chip flash memory is larger • 1 transistor / bit vs. 6 transistors / bit for SRAM

  4. Out of RAM – What Next? • Manually optimize RAM usage? • Remove features? • Buy MCUs with more RAM?

  5. Is RAM Used Efficiently? • Performed byte-level value profiling for TinyOS applications • Simulator counts distinct values stored in each byte of RAM over a number of executions • Result: Average byte stores four values! • These applications are already tuned for RAM usage

  6. RAM Compression • Use static analysis to find redundancy in scalars, pointers, structs, arrays in embedded C code • Automatically perform sub-word packing to save RAM

  7. Static Analysis • Input: C program • For example, output of nesC compiler • Output: Value set for each variable • E.g. {0,1} for a boolean • Analysis built on standard techniques • However: • Very aggressive compared to e.g. gcc • Many tricks to get good precision on TinyOS applications – several novel

  8. x ≝ variable that occupies n bits Vx≝ conservative estimate of value set log2|Vx| < n RAM compression possible Cx≝ another set such that |Cx| = |Vx| fx≝ bijection from Vx to Cx n - log2|Cx| bits saved through compression of x

  9. void (*function_queue[8])(void);

  10. void (*function_queue[8])(void); x n = size of a function pointer = 16 bits

  11. Vx x &function_A &function_B &function_C NULL

  12. n = 16 bits Vx x |Vx| = 4 log2|Vx| < n  2 < 16

  13. Vx Cx x fx≝Vx to Cx≝ compression 0 1 fx-1≝Cx to Vx≝ decompression 2 3

  14. ROM Cx Vx = { , , , } x 0 fx≝ compression table scan 1 2 fx-1≝ decompression table lookup 3

  15. ROM Cx Vx = { , , , } x 0 1 128 bits reduced to 16 bits 2 112 bits of RAM saved 3

  16. Implementation • Source-to-source transformation for C • Rewrite declaration, initializer, reads, writes • What about value sets of size 1? • log2 1 = 0 • No bits required – just propagate the constant • Optimizations • Avoid table-driven compression funcs when possible • Align hot compressed values on word boundaries • Merge redundant compression tables • Compile-time compression when storing constants

  17. RAM Compression Results *  simulator unavailable

  18. RAM Compression Results Constant Prop / DCE 10% RAM reduction 20% ROM reduction 5.9% duty cycle reduction *  simulator unavailable

  19. RAM Compression Results Constant Prop / DCE 10% RAM reduction 20% ROM reduction 5.9% duty cycle reduction Compression 22% RAM reduction 3.6% ROM reduction 29% duty cycle increase *  simulator unavailable

  20. Tuning RAM Compression • Can elect to not compress some compressible variables • But which ones? • For each compressible variable compute a cost / benefit ratio • Cost – estimated penalty in ROM or CPU cycles • Benefit – RAM savings • Sort compressible variables by ratio and compress until some threshold is met

  21. Cost/Benefit Ratio C ≝ access profile A,B ≝ platform-specific costs V ≝ cardinality of value set

  22. Cost/Benefit Ratio C ≝ access profile A,B ≝ platform-specific costs V ≝ cardinality of value set Su≝ original size Sc≝ compressed size

  23. Turning the RAM Knob 0%

  24. Turning the RAM Knob 10%

  25. Turning the RAM Knob 20%

  26. Turning the RAM Knob 30%

  27. Turning the RAM Knob 40%

  28. Turning the RAM Knob 50%

  29. Turning the RAM Knob 60%

  30. Turning the RAM Knob 70%

  31. Turning the RAM Knob 80%

  32. Turning the RAM Knob 90%

  33. Turning the RAM Knob 100%

  34. Turning the RAM Knob 95%

  35. Compression Spectrum

  36. Compression Spectrum 95%

  37. Compression Spectrum 95%

  38. Conclusion • RAM likely to remain scarce in low-cost, low-power MCUs • RAM is used inefficiently • Manually RAM optimization is unpleasant • Tradeoffs are useful • CComp implements RAM compression http://www.cs.utah.edu/~coop/research/ccomp/

  39. Analysis Times

  40. No RAM Compression * ⇒ simulator unavailable

  41. Full RAM Compression * ⇒ simulator unavailable

More Related