1 / 10

Token Coherence

Token Coherence. Milo M. K. Martin, Mark D. Hill, and David A. Wood Presented By Jerry Wu. Introduction. Goals Effectively utilize “glueless” design for multiprocessor servers Implement a low-latency cache coherence protocol that is also correct Separating performance from correctness.

dysis
Download Presentation

Token Coherence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Token Coherence Milo M. K. Martin, Mark D. Hill, and David A. Wood Presented By Jerry Wu

  2. Introduction • Goals • Effectively utilize “glueless” design for multiprocessor servers • Implement a low-latency cache coherence protocol that is also correct • Separating performance from correctness

  3. Problems and Solution • Traditional Snooping • Requires a totally-ordered interconnect • Directory • Indirection latency for cache-to-cache misses • Ideal coherence protocol • Avoid both indirections and interconnect ordering • This approach suffers from numerous race cases that are difficult to make correct • Solution – Token Coherence!

  4. Token Coherence Architecture • Two halves to the architecture • Correctness substrate • Uses token counting to enforce safety and uses persistent requests to prevent starvation • Allows the movement of data around the system without concern for order or races • Performance protocol • Uses “hint” requests to direct the correctness substrate to send data and tokens to the requesting processor • TokenB protocol – Token-Coherence-using-Broadcast

  5. Correctness Substrate • Enforcing Safety via Token Counting • Each block has T tokens at all time, one of which is the owner token • MOSI protocol • Processor can write only if it has all T tokens (M) • Processor can read only if it has at least one token (S) • Processor has no tokens (I) • Processor has the owner token but not all other tokens (O), if message contains the owner token, it must contain data • Valid bit is used to distinguish non-owner tokens without valid data

  6. Correctness Substrate II • Avoiding Starvation via Persistent Requests • Initiated when possible starvation is detected • At most one persistent request activated at any time • All nodes sees the persistent request and forward all tokens for the block • Memory operation is performed by the initiator before deactivating the persistent request

  7. Performance Protocol • No obligations for correctness! Make the common case fast (Amdahl’s Law). • Transient requests – fast, unordered “hint” requests which may fail • TokenB protocol • Processors broadcast all transient requests • Respond to transient requests using common MOSI protocol • Transient requests are reissued when failed until persistent request is activated

  8. Evaluation: interconnection network • Two different interconnection networks are used in evaluating token coherence vs. snooping. • 2-level tree is used for totally-ordered interconnection • 2D torus is used for unordered interconnect

  9. Evaluation • Is the number of reissued/persistent requests small? • Yes. 97% of cache misses are issued only once • Can TokenB outperform Snooping? • Yes. TokenB on Torus is faster than Snooping on Tree by 15-28% • Can TokenB outperform Directory and Hammer? • Yes. Removingindirection from critical path makes TokenB 17-54% faster than Directory and 8-29% faster than Hammer • TokenB’s traffic compared with Directory and Hammer? • Hammer uses 79-90% more traffic, Directory uses 21-25% less traffic • Is TokenB protocol scalable? • No. Broadcasting limits its scalability.

  10. Questions?

More Related