Token Coherence

Token Coherence Milo M. K. Martin, Mark D. Hill, and David A. Wood Presented By Jerry Wu

Introduction • Goals • Effectively utilize “glueless” design for multiprocessor servers • Implement a low-latency cache coherence protocol that is also correct • Separating performance from correctness

Problems and Solution • Traditional Snooping • Requires a totally-ordered interconnect • Directory • Indirection latency for cache-to-cache misses • Ideal coherence protocol • Avoid both indirections and interconnect ordering • This approach suffers from numerous race cases that are difficult to make correct • Solution – Token Coherence!

Token Coherence Architecture • Two halves to the architecture • Correctness substrate • Uses token counting to enforce safety and uses persistent requests to prevent starvation • Allows the movement of data around the system without concern for order or races • Performance protocol • Uses “hint” requests to direct the correctness substrate to send data and tokens to the requesting processor • TokenB protocol – Token-Coherence-using-Broadcast

Correctness Substrate • Enforcing Safety via Token Counting • Each block has T tokens at all time, one of which is the owner token • MOSI protocol • Processor can write only if it has all T tokens (M) • Processor can read only if it has at least one token (S) • Processor has no tokens (I) • Processor has the owner token but not all other tokens (O), if message contains the owner token, it must contain data • Valid bit is used to distinguish non-owner tokens without valid data

Correctness Substrate II • Avoiding Starvation via Persistent Requests • Initiated when possible starvation is detected • At most one persistent request activated at any time • All nodes sees the persistent request and forward all tokens for the block • Memory operation is performed by the initiator before deactivating the persistent request

Performance Protocol • No obligations for correctness! Make the common case fast (Amdahl’s Law). • Transient requests – fast, unordered “hint” requests which may fail • TokenB protocol • Processors broadcast all transient requests • Respond to transient requests using common MOSI protocol • Transient requests are reissued when failed until persistent request is activated

Evaluation: interconnection network • Two different interconnection networks are used in evaluating token coherence vs. snooping. • 2-level tree is used for totally-ordered interconnection • 2D torus is used for unordered interconnect

Evaluation • Is the number of reissued/persistent requests small? • Yes. 97% of cache misses are issued only once • Can TokenB outperform Snooping? • Yes. TokenB on Torus is faster than Snooping on Tree by 15-28% • Can TokenB outperform Directory and Hammer? • Yes. Removingindirection from critical path makes TokenB 17-54% faster than Directory and 8-29% faster than Hammer • TokenB’s traffic compared with Directory and Hammer? • Hammer uses 79-90% more traffic, Directory uses 21-25% less traffic • Is TokenB protocol scalable? • No. Broadcasting limits its scalability.

Questions?

Token Coherence

Token Coherence

Presentation Transcript

Token Coherence: Decoupling Performance and Correctness

Token Economies

Verifying Safety of a Token Coherence Implementation by Compositional Parametric Refinement

Coherence

Coherence

Coherence

Coherence

Token Coherence: Decoupling Performance and Correctness

Improving Multiple-CMP Systems with Token Coherence

Token Grab

Token Coherence for CMPs

COHERENCE

Token Systems

Token Economies

TOKEN BUS AND TOKEN RING

Security Token

Token Ring

Flexion Token

Token Coherence

Improving Multiple-CMP Systems with Token Coherence

Token Coherence: A Framework for Implementing Multiple-CMP Systems

Token Generator Feature in Token Development