530 likes | 892 Views
Efficient Metadata Management for Irregular Data Prefetching. Hao Wu , Krishnendra Nathella, Dam Sunwoo, Akanksha Jain, Calvin Lin. Regular Prefetching. Some programs access memory sequentially e.g. MPEG player Regular prefetchers are effective and widely used e.g. Best offset prefetcher.
E N D
Efficient Metadata Managementfor Irregular Data Prefetching Hao Wu, Krishnendra Nathella, Dam Sunwoo, Akanksha Jain, Calvin Lin
Regular Prefetching • Some programs access memory sequentially • e.g. MPEG player • Regular prefetchers are effective and widely used • e.g. Best offset prefetcher D G F A B C E
The Problem: Irregular Accesses • Common in many programs • ~30% performance opportunity for irregular SPEC2006 benchmarks D C E A X B Y
Temporal Prefetchers • Memorize correlations • Replay memorized accesses D C E A X B Y
Temporal Prefetchers • High metadata overhead (10~20 MB) • Too large to fit on-chip • Metadata stored off-chip • Problematic! Cache Metadata Traffic Demand Accesses DRAM Metadata
Irregular Stream Buffer (ISB) [MICRO’13] • Introduced an on-chip metadata cache • Metadata cache synchronized with TLB ~4× overhead Cache Metadata Demand Accesses DRAM Metadata
Our Solution: Managed ISB (MISB) • A new metadata management scheme • Decouples metadata management from TLB • Prefetches metadata Cache Metadata Metadata Prefetcher Demand Accesses DRAM Metadata
Background: ISB • Assign a structural address for each access in a stream • Convert irregular access streams to sequential streams D C E A X B Y Metadata
Background: ISB • Prefetch the next address in structural address space D C E A X B Y Metadata
Background: ISB Cache On-Chip Metadata Metadata Demand Accesses DRAM Metadata Off-Chip Metadata
Background: ISB TLB ISB’s metadata cache is synchronized with the TLB Cache On-Chip Metadata Metadata Demand Accesses DRAM Metadata Off-Chip Metadata
Background: ISB TLB ISB’s metadata cache is synchronized with the TLB Cache On-Chip Metadata Metadata Demand Accesses DRAM Metadata Off-Chip Metadata
Background: ISB TLB ISB’s metadata cache is synchronized with the TLB Cache On-Chip Metadata Metadata Demand Accesses DRAM Metadata Off-Chip Metadata
Background: ISB TLB ISB’s metadata cache is synchronized with the TLB Cache On-Chip Metadata Metadata Demand Accesses DRAM Metadata Off-Chip Metadata
Background: ISB TLB ISB’s metadata cache is synchronized with the TLB Cache On-Chip Metadata Metadata Demand Accesses DRAM Metadata Off-Chip Metadata
Deficiencies of ISB On-Chip Metadata TLB Demand Accesses On-Chip Metadata Size Required = TLB Size * Cache Lines Per Page
Deficiencies of ISB On-Chip Metadata Size Required = TLB Size * Cache Lines Per Page • Metadata is managed at coarse granularity • ~90% traffic is useless due to lack of spatial locality • Metadata size is proportional to page size • ISB does not scale to large pages • Metadata size is proportional to TLB size • ISB does not work for two-level TLBs
Components of MISB • Manage metadata at a fine granularity • Prefetch metadata • Filter out unnecessary accesses
Components of MISB • Manage metadata at a fine granularity • Prefetch metadata • Filter out unnecessary accesses
On-Chip Metadata MISB Operation A=? Cache Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata
On-Chip Metadata MISB Operation Cache Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses A=71 DRAM Metadata
On-Chip Metadata MISB Operation A=71 Cache Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata
Components of MISB • Manage metadata at a fine granularity • Prefetch metadata • Filter out unnecessary accesses
On-Chip Metadata MISB Operation A=71 Cache Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata
On-Chip Metadata MISB Operation Cache 72=?, 73=? Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata
On-Chip Metadata MISB Operation Cache Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses 72=X, 73=B DRAM Metadata
On-Chip Metadata MISB Operation 72=X, 73=B Cache Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata
Components of MISB • Manage metadata at a fine granularity • Prefetch metadata • Filter out unnecessary accesses
On-Chip Metadata MISB Operation M=? Cache Metadata Off-Chip Metadata Metadata Prefetcher Useless Traffic! Demand Accesses DRAM Metadata
On-Chip Metadata MISB Operation M=? Cache Metadata Bloom Filter Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata
On-Chip Metadata MISB Operation M= × Cache Metadata Bloom Filter Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata
Components of MISB • Manage metadata at a fine granularity • Prefetch metadata • Filter out unnecessary accesses
Evaluation Methodology • Industrial Simulator • ARMv8 AArch64 • OoO Core • 2-level TLB • Bandwidth: 32GB/s • Multicore – ChampSim • Similar trends as the industrial simulator • SPEC2006 • Irregular Subset • CloudSuite
Evaluated Prefetchers • Global correlation MISB STMS & Domino ISB • PC localization • PC localization
Global vs. PC-Localization while ( ! end ) { read tree->next; if (condition) read linked_list->next; } F Ba1Aa2D C E a3 …. Global F B A D C E …. PC localization a1 a2 a3 …. • PC-localization: Segregate the global stream by the load instruction’s PC • PC-localized streams are more predictable! F a1 a2 a3 B G A C D E
Evaluated Prefetchers • Global correlation • Metadata not cacheable Idealized STMS & Domino MISB ISB • PC localization • Metadata cacheable • Prefetches metadata • PC localization • Metadata cacheable • Syncs metadata with TLB
Traffic Overhead 1316%
Traffic Overhead 1316% 70%
Conclusions • MISB manages metadata effectively • Uses fine grained metadata caching • Introduces a metadata prefetcher • Empirical results • 70% traffic overhead vs. 342% for STMS • 23% speedup vs. 10% for idealized STMS • MISB makes temporal prefetching practical Scan QR Code for More Info
Thank you! Scan QR Code for More Info