290 likes | 303 Views
In this publication by Sriram Vajapeyam, the discussion revolves around exploiting page-level contention behavior for better cache performance, reduced tag overheads, and new cache organization strategies. The study focuses on tag contention patterns, cache indexing, and the cumulative participation of tag bits in cache contention. The potential of reducing the number of contending tag bits is explored, offering insights into optimizing cache systems.
E N D
Exploring Improved Cache OrganizationsBased on Page-Level Contention Behavior Sriram Vajapeyam Independent Consultant June 2002
A New Angle on Improving Caches • Contention for a Cache Block: • Is there a pattern at the Virtual Page level? • Yes! At least for SPECInt2000: • A few pages/page groups contend repeatedly • Contending Virtual Addresses differ in just a few bits • How do we exploit this? (c) S. Vajapeyam
Exploiting Page-Level Contention Behavior • Better Choice of Index bits • Reducing Tag Overheads: • New Cache Organization: Sub-Tagged Caches • Set-Associative Caches • Decoupled Sector Caches + Cache-Conscious Virtual Address space allocation (c) S. Vajapeyam
Talk Outline • Motivation • Previous Approaches Page-Level Contention Behavior • Tag-bit Contention Patterns • Further Reducing # of Contending Tag-bits Exploiting Tag-bit Contention Patterns • Better Cache Indexing • Sub-tagged DSC and Set-Associative Caches • Sub-tagged Caches (Direct-Mapped) (c) S. Vajapeyam
Motivation: Further Cache Work • Processor-Memory Speed Disparity • Wire Delay • smaller caches important for access speed • Very Deep Processor Pipelines • smaller caches important for access speed • Low Power • tag overheads (especially 64-bit addr) • smaller or banked caches We look at today’s L1 caches (c) S. Vajapeyam
Previous Cache Approaches Rich body of work exploits program behavior: • Locality: • Temporal, Spatial; Structural [Hsu & Sohi] • ConflictPatterns: • Physical Page Coloring [Bershad ‘94] • Cache Indexing [Gonzalez ‘97, Agarwal’92, etc etc] • Page-to-Bank Allocation [Vijay ‘01] • Data Access Timing: • Cachelets [Shen ‘01] (c) S. Vajapeyam
A Different Approach: Tag Contention Patterns • Differences between Tag Bits of replacement misses: Repetitive, Limited • e.g. only tag bits 28 and 16 different between replaced and fetched block • => only tag bits 28, 16 conflict or contend • this happens repeatedly Replacement Miss: • Replaces a live block (valid & to be used again) • Caused by a contending access (any of the 3 Cs) (c) S. Vajapeyam
Study Framework • IBM RS6000, 32-bit Virtual Addresses • IBM xlc compiler, -O3 optimization • SPEC Integer 2000 • Data Refs of 200M insts after discarding startup phase • (Validated against some 2B traces) • L1 Caches, Direct-Mapped, Virtual Address • 8K, 16K (32B block); 32K, 64K (64B block) (c) S. Vajapeyam
Caveats • SPEC • SPECInt • Dynamically-linked Code – can be different • Just one compiler-machine platform (IBM) • we know there are differences with Alpha, SUN (c) S. Vajapeyam
Contention Participation of Individual Tag Bits • Different Tag Bits contribute differently to contention • not just LSBs of tag bits are important • some hardly ever contribute • Some bits stand out: e.g. stack/heap bit Can “compress” tag representation (c) S. Vajapeyam
Cumulative Contention Participation of Tag Bits • 6 tag bits account for a large majority of replacements • > 90% for 6 benchmarks [16KB DM Cache] • > 80% for 3 benchmarks • LSB tag bits: 5 bits account for > 80% in 5 benchmarks => Important to consider MSB tag bits also, not just LSB e.g. in XOR indexing schemes (c) S. Vajapeyam
Groups of Contending Tag-bits • Several tag-bits contend together, not individually • e.g. bits 15,16,17,18 all differ simultaneously => particular pages/page-groups contend repeatedly • Examples: gzip: 17 perlbmk: 14,18-19,21-22,24-28 vpr: 14-16 vortex: 14-16 eon: 14,18-21,24-28 gap: 17-18,20,22,24-28 crafty:17,18,21,23-28 parser: 15-17,20-21,23-28 (c) S. Vajapeyam
Contribution of Top-10 Tag-bit Groups Benchmark 16KB Cache64KB Cache Eon 83.68 % 99.47 % Gzip 58.63 % 98.20 % Perlbmk 59.69 % 94.29 % Vpr 60.84 % 93.77 % Gap 77.01 % 91.87 % Crafty 49.24 % 78.82 % Twolf 23.18 % 74.61 % (c) S. Vajapeyam
Summary of Page-Level Behavior • Different tag-bits participate differently in contentions • 5-6 tag-bits cumulatively account for a large majority • Groups of tag-bits participate together in contention • => particular pages/page-groups contend frequently - How do we exploit this? - Can we further reduce # of contending tag bits? (c) S. Vajapeyam
Further Reducing # Contending Tag Bits • Cache-Conscious Compiler: suitable VA Space allocation • e.g. IBM stack-heap contention • relocate base of stack or heap • similarly for program data-structures • profile-driven? (c) S. Vajapeyam
32-bit VA Further Reducing # Contending Tag Bits • Relocation can be • within the VA • to an extended VA e.g. add 2 bits to VA only at L1 cache: V-V TLB VA Relocated-VA Filled by Snoop Logic or Dynamic Optimizer/Compiler (c) S. Vajapeyam
Exploiting Tag Contention Patterns • Improve Cache Performance (Hit Rate) • Reduce Tag Overheads (c) S. Vajapeyam
Better Cache Indexing • Use frequently conflicting tag bits in index instead • e.g. bit 17 instead of bits 14, 15 for gzip, 64KB cache • bit 28 (stack/heap bit) for some benchmarks • Different from XOR indexing: • XOR scatters refs across entire cache • tag bit can be used to choose a cache bank instead • Dynamic Index Selection (at least some bits) possible (c) S. Vajapeyam
Sub-Tagged Caches: Saving Tag Bits Cache Block: tag = main tag + sub-tag individual sub-tags main tag sub-blocks (c) S. Vajapeyam
Sub-Tagged Caches: Variable-Size Blocks! sub-tags + sub-block prefetch=? Variable-Size Block main tag emulated block (c) S. Vajapeyam
Set-Associative Sub-Tagged Caches Common Main Tag for Set: Sub-tags Main tag Set1 Set2 Set3 (c) S. Vajapeyam
Sub-Tagged DSC (Decoupled Sector Caches) Share Main Tag across Sector: sub-tags Sector main tag (c) S. Vajapeyam
Acknowledgements • Portions are Joint Work with: • Siddhartha Tambat (TCCA C.A.Letters, July 2002) • S. Muthulaxmi (M.S. Thesis, “A Study of Variable Block Size Caches”, Indian Institute of Science, Oct.1997) • Intel MRL Equipment Grant (c) S. Vajapeyam
Summary • “Replacement Misses” exhibit patterns in tag-bit conflicts: • A few tag bits dominant • Group participation of tag-bits (i.e. pages/page-groups) • Compiler/Hardware an enhance tag conflict patterns • Possible Applications: • Better Cache Indexing • Sub-tagged Caches (c) S. Vajapeyam