100 likes | 284 Views
Token Coherence. Milo M. K. Martin, Mark D. Hill, and David A. Wood Presented By Jerry Wu. Introduction. Goals Effectively utilize “glueless” design for multiprocessor servers Implement a low-latency cache coherence protocol that is also correct Separating performance from correctness.
E N D
Token Coherence Milo M. K. Martin, Mark D. Hill, and David A. Wood Presented By Jerry Wu
Introduction • Goals • Effectively utilize “glueless” design for multiprocessor servers • Implement a low-latency cache coherence protocol that is also correct • Separating performance from correctness
Problems and Solution • Traditional Snooping • Requires a totally-ordered interconnect • Directory • Indirection latency for cache-to-cache misses • Ideal coherence protocol • Avoid both indirections and interconnect ordering • This approach suffers from numerous race cases that are difficult to make correct • Solution – Token Coherence!
Token Coherence Architecture • Two halves to the architecture • Correctness substrate • Uses token counting to enforce safety and uses persistent requests to prevent starvation • Allows the movement of data around the system without concern for order or races • Performance protocol • Uses “hint” requests to direct the correctness substrate to send data and tokens to the requesting processor • TokenB protocol – Token-Coherence-using-Broadcast
Correctness Substrate • Enforcing Safety via Token Counting • Each block has T tokens at all time, one of which is the owner token • MOSI protocol • Processor can write only if it has all T tokens (M) • Processor can read only if it has at least one token (S) • Processor has no tokens (I) • Processor has the owner token but not all other tokens (O), if message contains the owner token, it must contain data • Valid bit is used to distinguish non-owner tokens without valid data
Correctness Substrate II • Avoiding Starvation via Persistent Requests • Initiated when possible starvation is detected • At most one persistent request activated at any time • All nodes sees the persistent request and forward all tokens for the block • Memory operation is performed by the initiator before deactivating the persistent request
Performance Protocol • No obligations for correctness! Make the common case fast (Amdahl’s Law). • Transient requests – fast, unordered “hint” requests which may fail • TokenB protocol • Processors broadcast all transient requests • Respond to transient requests using common MOSI protocol • Transient requests are reissued when failed until persistent request is activated
Evaluation: interconnection network • Two different interconnection networks are used in evaluating token coherence vs. snooping. • 2-level tree is used for totally-ordered interconnection • 2D torus is used for unordered interconnect
Evaluation • Is the number of reissued/persistent requests small? • Yes. 97% of cache misses are issued only once • Can TokenB outperform Snooping? • Yes. TokenB on Torus is faster than Snooping on Tree by 15-28% • Can TokenB outperform Directory and Hammer? • Yes. Removingindirection from critical path makes TokenB 17-54% faster than Directory and 8-29% faster than Hammer • TokenB’s traffic compared with Directory and Hammer? • Hammer uses 79-90% more traffic, Directory uses 21-25% less traffic • Is TokenB protocol scalable? • No. Broadcasting limits its scalability.