1 / 54

Token Coherence: Decoupling Performance and Correctness

slide 2 . We See Two Problems in Cache Coherence. 1. Protocol ordering bottlenecksArtifact of conservatively resolving racing requests?Virtual bus" interconnect (snooping protocols)Indirection (directory protocols)2. Protocol enhancements compound complexityFragile, error prone

nay
Download Presentation

Token Coherence: Decoupling Performance and Correctness

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project http://www.cs.wisc.edu/multifacet/ University of Wisconsin—Madison Something to ponder and keep in mind during the presentation: Why wasn’t this done before? Technology and workload trends Past focus on scalability The mindset that each request must succeed the first time Didn’t think of this decoupling Try to avoid oral pauses: “Ok?”, and “Right?” Need to separate goal setting from goal achieving Other things Previous systems have not done this: unordered broadcast protocol Not about logical time Give the context Replacements are latency tolerant Tokens in memory in ECC How do we select the timeout interval Decouple interconnect from protocol Alpha and AMD towards directories IBM and Sun towards snooping Remove the snooping/directory duality Implementation costs Bits in caches & memory Starvation-prevention mechanism Token counting vs token collection/finding Something to ponder and keep in mind during the presentation: Why wasn’t this done before? Technology and workload trends Past focus on scalability The mindset that each request must succeed the first time Didn’t think of this decoupling Try to avoid oral pauses: “Ok?”, and “Right?” Need to separate goal setting from goal achieving Other things Previous systems have not done this: unordered broadcast protocol Not about logical time Give the context Replacements are latency tolerant Tokens in memory in ECC How do we select the timeout interval Decouple interconnect from protocol Alpha and AMD towards directories IBM and Sun towards snooping Remove the snooping/directory duality Implementation costs Bits in caches & memory Starvation-prevention mechanism Token counting vs token collection/finding

    2. slide 2 We See Two Problems in Cache Coherence 1. Protocol ordering bottlenecks Artifact of conservatively resolving racing requests “Virtual bus” interconnect (snooping protocols) Indirection (directory protocols) 2. Protocol enhancements compound complexity Fragile, error prone & difficult to reason about Why? A distributed & concurrent system Often enhancements too complicated to implement (predictive/adaptive/hybrid protocols) Performance and correctness tightly intertwined Hardware shared-memory successful Especially for cost-effective, small-scale systems Further enhancements ? more complexity Low-latency/predictive/adaptive/hybrid protocols Direct communication, highly-integrated nodes Avoid ordered or synchronous interconnects, global snoop responses, logical timestamping Hardware shared-memory successful Especially for cost-effective, small-scale systems Further enhancements ? more complexity Low-latency/predictive/adaptive/hybrid protocols Direct communication, highly-integrated nodes Avoid ordered or synchronous interconnects, global snoop responses, logical timestamping

    3. slide 3 Rethinking Cache-Coherence Protocols Goal of invalidation-based coherence Invariant: many readers -or- single writer Enforced by globally coordinated actions Enforce this invariant directly using tokens Fixed number of tokens per block One token to read, all tokens to write Guarantees safety in all cases Global invariant enforced with only local rules Independent of races, request ordering, etc.

More Related