240 likes | 388 Views
Token Coherence: Decoupling Performance and Correctness. Article by: Martin, Hill & Wood Presented by: Michael Tabet CS 7698. A Tale of Two Methods. Snooping based Uses totally ordered broadcasts to preserve correctness Uses lots of bandwidth Big (large busses) = BAD! Directory based
E N D
Token Coherence: Decoupling Performance and Correctness Article by: Martin, Hill & Wood Presented by: Michael Tabet CS 7698 University of Utah CS 7698
A Tale of Two Methods • Snooping based • Uses totally ordered broadcasts to preserve correctness • Uses lots of bandwidth • Big (large busses) = BAD! • Directory based • Uses indirection to preserve bandwidth • Indirection adds latency • Needs a directory controller CS 7698
Potential work arounds Snooping • Snooping is fast, but requires a bus. Big fast busses are complex -> • Use a virtual bus to virtual broadcast! Directory • Networks require lots of logic (especially big ones) -> • Use glueless networks! CS 7698
Token Coherence Provides for both indirection, and speed up through unordered broadcasts Two components: • Correctness substrate • Performance protocol CS 7698
CorrectnessSpeed is Good, Correctness is Better! Need to guarantee ordered reads/writes! Thus, use a correctness “substrate” CS 7698
Correctness Invariants • At all times, each block has T tokens • A processor can only write a block if it holds all T tokens • A processor can read a block only if it holds at least one token • If a coherence message contains one or more tokens, it must contain data CS 7698
Invariant 1 Implications Allows for precise control of blocks of data. CS 7698
Invariant 2 Implications Enables write control mechanism to allow in order writes CS 7698
Invariant 3 Implications Restricts reads CS 7698
Invariant 4 Implications Provides a method to ensure cache coherence CS 7698
Starvation Invariants allow of ordered reads/writes, but how do we prevent starvation? Persistent requests: • A processor times out on transient requests • Raises a persistent request (only one per block) • All nodes must forward blocks to the node But repeated & persistent requests only make up 1-3% of the messages CS 7698
Persistent Request State Diagram CS 7698
Performance protocol But if you always follow the rules, it can get slow and tedious! Tokens allow for unordered responses to requests. This opens the door for all sorts of optimizations CS 7698
TokenBA New Contender Akin to MSI snooping protocol: • Requests broadcast • Data exists either in • Modified (All tokens) • Shared (Some tokens) • Invalid (No tokens) But: Performance protocol allows for better performance! CS 7698
TokenB: Optimized Token Counting MSI was a bit of a lie, can optimize token counting by altering invariants 1,3,4: 1. At all times, each block has T tokens, one of which is the owner token 3. A processor can read a block only if it holds at least one token for that block and has valid data 4. If a coherence message contains the owner token, it must contain data CS 7698
TokenB ContinuedThe Good Stuff Performance in: • Tokens allow replies to be sent unordered, and indirectly (no broadcast) This means: • 15-28% faster than snooping • 17-54% faster than directory • 21-25% less bandwidth than snooping CS 7698
An Example P1 reads then P2 writes then P1 reads Presume a 4 node systems, where P1 has an invalid copy, P2 has a shared copy, and P3 is the “home/owner” node CS 7698
ExampleThe Snooping Way P1 P2 P3 P4 1 2 3 4 5 All messages broadcast! CS 7698
ExampleThe Directory Way P1 P2 P3 P4 1 3 2 4 4 4 4 5 6 Directory process messages 1 3 4 5! Directory CS 7698
ExampleThe Token Way P1 P2 P3 P4 1(broadcast) 2 3(broadcast) 4 4 4 5(broadcast) 6 CS 7698
Real world results Examined on a tree structure (virtual broadcast), and on a 2d torus Migratory optimization: a read request after a write is forwarded all tokens Benchmarked on OLTP, SPECjbb, Apache CS 7698
Conclusion TokenB offers a good performance for small-middle sized parallel systems Broadcasts limits scalability past 16 nodes But other performance implementations could be scaled larger! CS 7698