1 / 12

Cache coherence for CMPs

Cache coherence for CMPs. Miodrag Bolic. Private cache. Each cache bank is private to a particular core Cache coherence is maintained at the L2 cache level Intel Montecito [81], AMD Opteron [56], or IBM POWER6 [63]. Private cache. Advantages . Disadvantages. Data blocks can get duplicated

erna
Download Presentation

Cache coherence for CMPs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cache coherence for CMPs Miodrag Bolic

  2. Private cache • Each cache bank is private to a particular core • Cache coherence is maintained at the L2 cache level • Intel Montecito [81], AMD Opteron [56], or IBM POWER6 [63]

  3. Private cache Advantages Disadvantages Data blocks can get duplicated if the working set accessed by the different cores is not well-balanced, some caches can be over-utilized whilst others can be under-utilized • Short L2 cache access latency • Small amount of network traffic generated: Since the local L2 cache bank can filter most of the memory requests, the number of coherence messages injected into the interconnection network is limited.

  4. Shared cache • Cache coherence is maintained at the L1 level • Bits usually chosen for the mapping to a particular bank are the less significant ones • Piranha [16], Hydra [47], Sun UltraSPARC T2 [105] and Intel Merom [104]

  5. Shared caches Advantage Disadvantages Many requests will be will be serviced by remote banks (L2 NUCA architecture) • Single copy of blocks • Workload balancing: Since the utilization of each cache bank does not depend on the working set accessed by each core, but they are uniformly distributed among cache banks in a round-robin fashion, the aggregate cache capacity is augmented.

  6. Hammer protocol • AMD - Opteron systems • It relies on broadcasting requests to all tiles to solve cache misses • It targets systems that use unordered point-to-point interconnection networks • On every cache miss, Hammer sends a request to the home tile. If the memory block is present on-chip, the request is forwarded to the rest of tiles to obtain the requested block • All tiles answer to the forwarded request by sending either an acknowledgement or the data message to the requesting core. • The requesting core needs • to wait until it receives the response from each other tile. When the requester receives all the responses, it sends an unblock message to the home tile.

  7. Hammer protocol Disadvantages • Requires three hops in the critical path before the requested data block is obtained. • Broadcasting invalidation messages increases considerably the traffic injected into the interconnection network and, therefore, its power consumption.

  8. Directory protocol • In order to accelerate cache misses, this directory information is not stored in main memory. Instead, it is usually stored on-chip at the home tile of each block. • In tiled CMPs, the directory structure is split into banks which are distributed across the tiles. • Each directory bank tracks a particular range of memory blocks.

  9. Directory protocol • The indirection problem • every cache miss must reach the home tile before any coherence action can be performed. • adds unnecessary hops into the critical path of the cache misses • The directory memory overhead to keep the track of sharers for each memory block could be intolerable for large-scale configurations. • Example: block size 16 bytes, 64 tiles

  10. Comparison of protocols

  11. Interleaving

  12. Mapping between cache entries and directory entries • One way to keep constant the size of the directory entries is storing duplicate tags.

More Related