1 / 24

Token Coherence: Decoupling Performance and Correctness

Token Coherence: Decoupling Performance and Correctness. Article by: Martin, Hill & Wood Presented by: Michael Tabet CS 7698. A Tale of Two Methods. Snooping based Uses totally ordered broadcasts to preserve correctness Uses lots of bandwidth Big (large busses) = BAD! Directory based

malina
Download Presentation

Token Coherence: Decoupling Performance and Correctness

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Token Coherence: Decoupling Performance and Correctness Article by: Martin, Hill & Wood Presented by: Michael Tabet CS 7698 University of Utah CS 7698

  2. A Tale of Two Methods • Snooping based • Uses totally ordered broadcasts to preserve correctness • Uses lots of bandwidth • Big (large busses) = BAD! • Directory based • Uses indirection to preserve bandwidth • Indirection adds latency • Needs a directory controller CS 7698

  3. Potential work arounds Snooping • Snooping is fast, but requires a bus. Big fast busses are complex -> • Use a virtual bus to virtual broadcast! Directory • Networks require lots of logic (especially big ones) -> • Use glueless networks! CS 7698

  4. Token Coherence Provides for both indirection, and speed up through unordered broadcasts Two components: • Correctness substrate • Performance protocol CS 7698

  5. CorrectnessSpeed is Good, Correctness is Better! Need to guarantee ordered reads/writes! Thus, use a correctness “substrate” CS 7698

  6. Correctness Invariants • At all times, each block has T tokens • A processor can only write a block if it holds all T tokens • A processor can read a block only if it holds at least one token • If a coherence message contains one or more tokens, it must contain data CS 7698

  7. Invariant 1 Implications Allows for precise control of blocks of data. CS 7698

  8. Invariant 2 Implications Enables write control mechanism to allow in order writes CS 7698

  9. Invariant 3 Implications Restricts reads CS 7698

  10. Invariant 4 Implications Provides a method to ensure cache coherence CS 7698

  11. Starvation Invariants allow of ordered reads/writes, but how do we prevent starvation? Persistent requests: • A processor times out on transient requests • Raises a persistent request (only one per block) • All nodes must forward blocks to the node But repeated & persistent requests only make up 1-3% of the messages CS 7698

  12. Persistent Request State Diagram CS 7698

  13. Performance protocol But if you always follow the rules, it can get slow and tedious! Tokens allow for unordered responses to requests. This opens the door for all sorts of optimizations CS 7698

  14. TokenBA New Contender Akin to MSI snooping protocol: • Requests broadcast • Data exists either in • Modified (All tokens) • Shared (Some tokens) • Invalid (No tokens) But: Performance protocol allows for better performance! CS 7698

  15. TokenB: Optimized Token Counting MSI was a bit of a lie, can optimize token counting by altering invariants 1,3,4: 1. At all times, each block has T tokens, one of which is the owner token 3. A processor can read a block only if it holds at least one token for that block and has valid data 4. If a coherence message contains the owner token, it must contain data CS 7698

  16. TokenB ContinuedThe Good Stuff Performance in: • Tokens allow replies to be sent unordered, and indirectly (no broadcast) This means: • 15-28% faster than snooping • 17-54% faster than directory • 21-25% less bandwidth than snooping CS 7698

  17. An Example P1 reads then P2 writes then P1 reads Presume a 4 node systems, where P1 has an invalid copy, P2 has a shared copy, and P3 is the “home/owner” node CS 7698

  18. ExampleThe Snooping Way P1 P2 P3 P4 1 2 3 4 5 All messages broadcast! CS 7698

  19. ExampleThe Directory Way P1 P2 P3 P4 1 3 2 4 4 4 4 5 6 Directory process messages 1 3 4 5! Directory CS 7698

  20. ExampleThe Token Way P1 P2 P3 P4 1(broadcast) 2 3(broadcast) 4 4 4 5(broadcast) 6 CS 7698

  21. Real world results Examined on a tree structure (virtual broadcast), and on a 2d torus Migratory optimization: a read request after a write is forwarded all tokens Benchmarked on OLTP, SPECjbb, Apache CS 7698

  22. ResultsToken vs Snooping: TOKEN Wins! CS 7698

  23. ResultsDirectory vs Token: Token mostly wins! CS 7698

  24. Conclusion TokenB offers a good performance for small-middle sized parallel systems Broadcasts limits scalability past 16 nodes But other performance implementations could be scaled larger! CS 7698

More Related