1 / 22

Xavier de Gunten

Kaxiras and Martonosi : “Computer architecture techniques for Power Efficiency” Synthesis Lectures in Computer Architectures, 2008 Ch. 4.8 - 4.13. Xavier de Gunten. Sections. 4.8 Idle-Capacity Switching Activity: Caches 4.9 Parallel Switching-Activity in Set-Associative Caches

Download Presentation

Xavier de Gunten

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kaxiras and Martonosi: “Computer architecture techniques for Power Efficiency” Synthesis Lectures in Computer Architectures, 2008Ch. 4.8 - 4.13 Xavier de Gunten

  2. Sections • 4.8 Idle-Capacity Switching Activity: Caches • 4.9 Parallel Switching-Activity in Set-Associative Caches • 4.10 Cacheable Switching Activity • 4.11 Speculative Activity • 4.12 Value-Dependent Switching Activity: Bus Encodings • 4.13 Dynamic Work Steering

  3. Idle-Capacity Switching Activity • Cache resizing that trades memory between 2 cache levels • Selective Cache Ways – Resizing through associativity • Accounting Cache – Combination of 1 and 2 • CAM-tag Resizing

  4. Cache Resizing: trading memory between 2 cache levels • Structures partitioned in segments with buffered wires • Trading between L1 and L2 by altering associativity • Use CPI and “phase” to determine organization of caches

  5. Selective Cache Ways – Resizing through associativity • Resize 1 big cache through associativity • Large Cache partitioned into subarrays • Disabling a cache way => ignore cache accesses but tags remain active

  6. Accounting Cache • Cross between selective ways cache and variable L1/L2 division • Fake L2 cache • True LRU replacement policy • One-shot configuration • Energy savings between 35 – 53 % with performance lost of 1 - 4 %

  7. CAM-Tag Cache Resizing • RAM -> high performance, CAM -> power efficient • Resizing is more advantageous for highly associative CAM-tag caches • Resizing granularity is finer for CAM • CAM tag cache resizing can be done individually per set because bit lines run across ways of a set • Control Policy -> Performance-based feedback loop • Based on # of misses in given time window

  8. CAM-Tag Cache Resizing

  9. Parallel Switching Activity in Set-Associative Caches • Associative cache consumes power linearly to associativity • Phased Cache • Phase 1  tags Phase 2  data

  10. Sequentially Accessed Set-Associative Cache • Only most likely way to produce hit is probed (MRU) • Similar power/performance to direct-mapped cache • Misses are expensive • Way prediction • Prediction structure to hold MRU information

  11. Advanced Way-Prediction Mechanisms • Selective direct-mapping • Set-associative for tags • Direct-mapped for data • Separating conflicting and non-conflicting lines

  12. Multi-MRU • Allow multiple MRU predictors to disambiguate among tags

  13. Way Selection • Location Cache • Store position of L2 cache lines for L1 misses • Way Halting • Halt parallel access once hit and location are determined in partial tag compare • Decaying Bloom Filters • Determine which lines are live and dead • Decreases the number of ways that need to be searched

  14. Cache Coherence Protocols • Exclude-Jetty • Small tag-cache to determine what is not cached in L2 • Include-Jetty • Hash table to capture superset of what is cached in L2 • Hybrid-Jetty • Consult both an Exclude and Include-Jetty

  15. Cacheable Switching Activity • Work Reuse (Reduce Repetitive Computing) • Operation Level: Memoization • Instruction Level: Instruction Reuse Buffers • Basic Block Level: Block History Buffer • Trace Level: Groups of consecutive instructions based on dynamic execution

  16. Cacheable Switching Activity • Filter Cache • 128-256 byte cache inserted before L1 • Loop Cache • Software/compiler controlled • Filter cache for Instructions • Trace Cache • Pentium 4 • Stores blocks of instructions

  17. Speculative Activity • Incorrect speculation is costly • Instruction reuse buffer • Pipeline gating • Stall the whole pipeline when confidence in branch prediction is low • Trade vs doing less work and larger penalty of stalling • Selective Throttling

  18. Value-Dependent Switching Activity: Bus Encodings • Two factors that drive power consumption • Average # of signal transition • Capacitance of wires • Dynamic Base Register Caching • Grey Encoding • T0 Encoding

  19. Address and Data Buses • Bus-inversion encoding • Dictionary-based solutions • Frequent-Value Encoding

  20. Dynamic Work Steering • Circuit Level: Precomputation • Microarchitectural Level: Deal with idle-width activity • Processor core Level: Activity Migration

  21. Dynamic Work Steering

More Related