220 likes | 330 Views
Kaxiras and Martonosi : “Computer architecture techniques for Power Efficiency” Synthesis Lectures in Computer Architectures, 2008 Ch. 4.8 - 4.13. Xavier de Gunten. Sections. 4.8 Idle-Capacity Switching Activity: Caches 4.9 Parallel Switching-Activity in Set-Associative Caches
E N D
Kaxiras and Martonosi: “Computer architecture techniques for Power Efficiency” Synthesis Lectures in Computer Architectures, 2008Ch. 4.8 - 4.13 Xavier de Gunten
Sections • 4.8 Idle-Capacity Switching Activity: Caches • 4.9 Parallel Switching-Activity in Set-Associative Caches • 4.10 Cacheable Switching Activity • 4.11 Speculative Activity • 4.12 Value-Dependent Switching Activity: Bus Encodings • 4.13 Dynamic Work Steering
Idle-Capacity Switching Activity • Cache resizing that trades memory between 2 cache levels • Selective Cache Ways – Resizing through associativity • Accounting Cache – Combination of 1 and 2 • CAM-tag Resizing
Cache Resizing: trading memory between 2 cache levels • Structures partitioned in segments with buffered wires • Trading between L1 and L2 by altering associativity • Use CPI and “phase” to determine organization of caches
Selective Cache Ways – Resizing through associativity • Resize 1 big cache through associativity • Large Cache partitioned into subarrays • Disabling a cache way => ignore cache accesses but tags remain active
Accounting Cache • Cross between selective ways cache and variable L1/L2 division • Fake L2 cache • True LRU replacement policy • One-shot configuration • Energy savings between 35 – 53 % with performance lost of 1 - 4 %
CAM-Tag Cache Resizing • RAM -> high performance, CAM -> power efficient • Resizing is more advantageous for highly associative CAM-tag caches • Resizing granularity is finer for CAM • CAM tag cache resizing can be done individually per set because bit lines run across ways of a set • Control Policy -> Performance-based feedback loop • Based on # of misses in given time window
Parallel Switching Activity in Set-Associative Caches • Associative cache consumes power linearly to associativity • Phased Cache • Phase 1 tags Phase 2 data
Sequentially Accessed Set-Associative Cache • Only most likely way to produce hit is probed (MRU) • Similar power/performance to direct-mapped cache • Misses are expensive • Way prediction • Prediction structure to hold MRU information
Advanced Way-Prediction Mechanisms • Selective direct-mapping • Set-associative for tags • Direct-mapped for data • Separating conflicting and non-conflicting lines
Multi-MRU • Allow multiple MRU predictors to disambiguate among tags
Way Selection • Location Cache • Store position of L2 cache lines for L1 misses • Way Halting • Halt parallel access once hit and location are determined in partial tag compare • Decaying Bloom Filters • Determine which lines are live and dead • Decreases the number of ways that need to be searched
Cache Coherence Protocols • Exclude-Jetty • Small tag-cache to determine what is not cached in L2 • Include-Jetty • Hash table to capture superset of what is cached in L2 • Hybrid-Jetty • Consult both an Exclude and Include-Jetty
Cacheable Switching Activity • Work Reuse (Reduce Repetitive Computing) • Operation Level: Memoization • Instruction Level: Instruction Reuse Buffers • Basic Block Level: Block History Buffer • Trace Level: Groups of consecutive instructions based on dynamic execution
Cacheable Switching Activity • Filter Cache • 128-256 byte cache inserted before L1 • Loop Cache • Software/compiler controlled • Filter cache for Instructions • Trace Cache • Pentium 4 • Stores blocks of instructions
Speculative Activity • Incorrect speculation is costly • Instruction reuse buffer • Pipeline gating • Stall the whole pipeline when confidence in branch prediction is low • Trade vs doing less work and larger penalty of stalling • Selective Throttling
Value-Dependent Switching Activity: Bus Encodings • Two factors that drive power consumption • Average # of signal transition • Capacitance of wires • Dynamic Base Register Caching • Grey Encoding • T0 Encoding
Address and Data Buses • Bus-inversion encoding • Dictionary-based solutions • Frequent-Value Encoding
Dynamic Work Steering • Circuit Level: Precomputation • Microarchitectural Level: Deal with idle-width activity • Processor core Level: Activity Migration