110 likes | 247 Views
Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors. Abhishek Bhattacharjee and Margaret Martonosi Department of Electrical Engineering Princeton University ASPLOS’10 . TLB management. Hardware-managed TLB No need for expensive interrupts
E N D
Inter-Core Cooperative TLB Prefetchersfor Chip Multiprocessors AbhishekBhattacharjee and Margaret Martonosi Department of Electrical Engineering Princeton University ASPLOS’10
TLB management • Hardware-managed TLB • No need for expensive interrupts • Pipeline remains largely unaffected • OS cannot employ alternate design • Software-managed TLB • Data structure design is flexible since the OS controls the page table walk • Miss handler is also instructions • It may itself miss in the inst. cache. • Data cache may be polluted by the page table walk
Multiprocessor TLB miss • CMP maintains per-core instruction and data TLBs. • Significant similarities exist in TLB miss patterns among multiple cores.
Predictable TLB Miss Pattern • Inter-core Shared (ICS) TLB Misses • Translation accessed by a previous miss on any of the other cores with the same virtual page, physical page, context ID, and page size • Leader-Follower prefetching • Inter-core Predictable Stride (ICPS) TLB Misses • A stride of S if its virtual page V+S differs by S from the virtual page V of the preceding matching miss • Core 0 TLB Miss virtual pages : 3, 4, 6, 7 • Core 1 TLB Miss virtual pages : 7, 8, 10, 11 • Core distances are 1, 2, 1 • Although the cores are missing on different virtual pages, they both have the same distance pattern in their misses • Distance-based cross-core prefetching
Leader-Follower Prefetching • If a core (the leader) TLB misses on a particular virtual page entry, other cores (the followers) will also typically TLB miss on the same virtual page eventually • Pushing virtual page entry into the followers’ TLB • Not directly into the TLB, but instead insert into a small separate Prefetch Buffer(PB). • The bad prefetch may be harmful in that it will be unused. • The prefetch may be harmful in that it will evict existing PB entries too early
Leader-Follower Prefetching • Case 1 • D-TLB miss / PB hit on core 0 • remove the entry from core 0’s PB • Add the entry to its TLB • Case 2 • D-TLB miss / PB miss on core 1 • Translation is located and refilled into the D-TLB • Prefetched(pushed) into PBs of the other cores
Leader-Follower Prefetching • Prefetch a translation into all the follower cores every time a TLB and PB miss occurs on the leader core • This approach may be over-aggressive • Confidence estimation • 2-bit saturating counters • Core 0 has counters for cores 1 to N-1 • B-bit confidence counter is greater or equal to 2B-1, prefetch to a follower
Leader-Follower Prefetching • Case 1 • PB hit on core 0 and insert PB entry into D-TLB • Identify the initiating core(core 1) • Increment core 1’s confidence counter corresponding to core 0 • Case 2 • D-TLB / PB miss on core 1 • Check the confidence counter ≥2B-1 • If core 1’s counter corresponding to core 0 is above this value, pushes the translation into core 0’s PB • Case 3 • PB entry is evicted from core N-1 without being used. • Send message –bad prefetch- to the core that initiated this entry (core 1) • Core 1’s counter corresponding to core N-1 is decremented
Distance-Based Cross-Core Prefetching • Although the cores are missing on different virtual pages, they can both have the same distance pattern in their misses • Record repetitive distance-pairs to find the next predicted distance and hence the next virtual pages. • Find the stride patterns
Distance-Based Cross-Core Prefetching • 1. PB miss : calculate the current distance (current TLB miss virtual page - last virtual page) • 2. Look up the distance table(DT) using the current distance & the last distance • 3. DT extracts predicted future distances from the stored distance-pairs • (1,2), (2,1)…… • 4. the predicted distances are used to calculate the corresponding virtual pages and insert into PB
Result 16 entries in PB, Average 46%