1 / 21

Outline

Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin-Madison. Outline. Introduction Motivation Adaptive Cache Compression Evaluation Methodology Reported performance Review conclusion

washi
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adaptive Cache Compression for High-Performance ProcessorsAlaa R. Alameldeen and David A.WoodComputer Sciences Department, University of Wisconsin-Madison

  2. Outline • Introduction • Motivation • Adaptive Cache Compression • Evaluation Methodology • Reported performance • Review conclusion • Critics/Suggestions

  3. Introduction Increasing performance gap between processors and memory calls for faster memory access. • Cache memories – reduce average memory latency • Cache Compression – improves performance of cache memories • Adaptive Cache Compression – Theme of this discussion

  4. Motivation • Cache compression can improve effectiveness of cache memories (increase effective cache capacity) • Increasing effective cache capacity reduces miss rate • Performance will improve !

  5. Adaptive Cache Compression An Overview • Dynamically optimize cache performance • Use the past to predict the future • How likely is compression going to help, hurt, or make no difference to next reference? • Feedback from previous compression helps to decide whether to compress the next write to cache

  6. Adaptive Cache CompressionImplementation • 2-level cache hierarchy • L1 cache (data, instruction) • uncompressed • L2 cache is unified and • optionally compressed • Decompression/ • Compression used/skipped • as necessary Pros: L1 cache performance not affected Cons: Compression/Decompression introduces latency

  7. Adaptive Cache CompressionL2 cache detail • 8-way set associative • Use a compression information tag stored with each address tag • 32 segments (8 bytes each) in each set • An uncompressed line comprises 8 segments (4 uncompressed lines max in each set) • Compressed lines are 1 to 7 segments in length • Max number of lines in each set =8 • Least recently used (LRU) lines evicted • Compacting may be used to make room for a new line

  8. Adaptive Cache Compression:To compress or not to compress? • While compression eliminates L2 misses, it increases the latency of L2 hits (more frequent). • However, penalty for L2 misses is usually large and extra latency due to decompression is usually small. • Compression helps if: ( avoided L2 misses ) x (L2 miss penalty) ( penalized L2 hits ) x ( decompression penalty) > Example: For a 5 cycle decompression penalty and 400 cycle cycle L2 miss penalty, compression wins if it eliminates at least one L2 miss for every 400/5=80 penalized L2 hits

  9. Adaptive Cache CompressionClassification of Cache References • Classifications of hits • Unpenalized hit (e.g. reference to address A) • Penalized hit (e.g. reference to address C) • Avoided miss (e.g. reference to address E) • Classifications of misses • Avoidable miss ( e.g. reference to address G) • Unavoidable miss ( e.g. reference to address H) Evicted

  10. Adaptive Cache CompressionHardware use in decision-making • Global Compression Predictor • estimates the recent cost or benefit of compression • On a penalized hit, the controller biases against compression by decrementing the counter ( subtractedvalue=decompression penalty) • On an avoided or avoidable miss, the controller increments the counter by the L2 miss penalty. • The controller uses the GCP when allocating a line in the L2 cache • Positive value -> compression has helped, so now compress • Negative value -> compression has been penalizing, so don’t compress • Size of GCP determines sensitivity to changes • In this paper, 19-bit used ( saturates at 262143 or -262144 )

  11. Adaptive Cache CompressionSensitivity • Effectiveness depend on the workload’s size, cache’s size and latencies • Sensitive to L2 cache size (effective for small L2 cache) • Sensitive to L1 cache size (observe trade-offs) • Adapting to benchmark phase - changes in phase behaviour may hurt adaptive policy - takes time to adapt

  12. Evaluation Methodology • Host system: dynamically-scheduled SPARC V9 uniprocessor • Target system: superscalar processor with out-of-order execution • Simulation Parameters:

  13. Evaluation Methodology (continued) • Simulator: Simics full-system simulator, extended with a detailed processor simulator (TFSim), and a detailed memory system timing simulator. • Workloads: • multi-threaded commercial workloads from the Wisconsin Commercial workload suite • eight of the SPECcpu2000 benchmarks • Integer benchmarks (bzip, gcc, mcf, twolf) • Floating benchmarks (ammp, applu, equake, swim) Workloads selected to cover a wide range of compressibility properties, miss rates and working set sizes.

  14. Evaluation methodology (continued) • To understand the utility of adaptive compression, 2 extreme policies ( Never compress, and always compress were compared with ) • ‘Never’ strives to reduce hit latency • ‘Always’ strives to reduce miss rate • ‘Adaptive’ strives to optimize.

  15. Reported Performance(Average cache capacity) Figure: Average cache capacity during benchmark runs (4MB uncompressed)

  16. Reported Performance (cache miss rate) Figure: L2 cache miss rate (normalized to “Never” miss rate)

  17. Reported Performance (Runtime) Figure: Runtime for the three compression alternatives (normalized to “Never”)

  18. Reported Performance(sensitivity of adaptive compression to benchmark phase changes) Top: temporal changes in Global Compression Predictor values Bottom: effective cache size

  19. Review Conclusion • Compressing all compressible cache lines only improves memory-intensive applications. Applications with low miss rate / compressibility suffer. • Optimization achieved by adaptive scheme are: • Up to 26% speedup (over uncompressed scheme) for memory-intensive, highly-compressible benchmarks • Performance degradation for other benchmarks < 0.4%

  20. Critics/Suggestions • Data inconsistency:17% improvement in performance for memory-intensive commercial workloads claimed on page 2 but 26% claimed on page 11. • Miscalculation on page 4 • The sum of the compressed sizes at stack depths 1 through 7 totals 29. • However, this miss cannot be avoided because the sum of compressed sizes exceeds the total number of segments (i.e. 35 > 32 ) . • All in all, the proposed technique doesn’t seem to enhance performance significantly with respect to ‘always’.

  21. Thank you !

More Related