Using Prediction to Accelerate Coherence Protocols

Using Prediction to Accelerate Coherence Protocols Authors : Shubendu S. Mukherjee and Mark D. Hill 1998. Proceedings. The 25th Annual International Symposium on Computer Architecture Publication Date: 27 Jun-1 Jul 1998 On page(s): 179-190 Presenter : Naresh Sukumar

Motivation • In multi processors using directory protocols, some memory references suffer long latencies for misses to remotely-cached blocks. • To ameliorate this latency, standard coherence protocols have been augmented with optimizations for specific sharing patterns (eg. Read-modify-write, producer-consumer and migratory sharing • This paper aims to create a general prediction logic that adapts to the actual patterns encountered during operation.

What will be covered ? • Introduction to the directory protocol • General behavior of a predictor. • The Cosmos coherence message predictor. • Integrating Cosmos with a coherence Protocol. • Benchmarking the Cosmos • Analysis of the Results • Conclusions

Introduction to the Directory Protocol • Preferred method of cache coherence in large-scale shared-memory multiprocessors. • This protocol associates state with both caches and memory at the granularity of a cache block. • To simplify discussion, this paper considers a full-map and write-invalidate directory protocol. A sample of coherence messages usually found in full-map, write-invalidate coherence protocols.

Disadvantages • It often incurs multiple long-latency operations. • A directory may need to exchange messages with other caches before it can respond to a processor's request for a memory block. A store action to a block residing in another node’s cache

General Behavior of a Predictor • Predictors predict future sharing patterns and take actions to overlap coherence message activity with current work. • Types: • Read-modify-write • Pair-wise sharing • Dynamic self-invalidation • Migratory protocols • Predictors would sit beside each standard directory and cache module to monitor coherence activity and request appropriate actions.

The Cosmos coherence message Predictor • Signature patterns • Basic structure of Cosmos • Updating Cosmos • Adaptability to a complex signature • Filtering Noise • Implementation issues for Cosmos.

Signature patterns Sequence of message signatures by the producer cache, consumer cache and directory. In a slightly more complicated example, we can have two consumers sending a get_ro_request. It can be seen later that the order in which they arrive does not matter.

Basic Structure of Cosmos Logic structure of the Cosmos coherence message predictor • Two important things required: • Address of cache blocks – As patterns may be different for different cache blocks. • History of messages for a cache block.

Basic Structure of Cosmos contd… MHT – Message History Table PHT – Pattern History Table Obtaining a Prediction from Cosmos

Updating Cosmos • Index into MHR table with address of a cache block • Use the entry in MHR to index into the corresponding PHT. • Write new <sender, type> tuple as new prediction for the index corresponding to the MHR entry. • Left shift the <sender, type> tuple into the MHR for the cache block.

Adaptability to a complex signature Cosmos can adapt to complex message streams. For a scenario where the directory receives messages from two or three consumers, the Cosmos can adapt itself making itself immune to the order of arrival of the messages.

Filtering Noise • For ex. If 99% of the time, message B follows message A, then on seeing message A, Cosmos will predict the next message to be B. • The prediction should not change if rarely, these messages arrive in the sequence A, C, B instead of A, B. • Use counter and update the prediction only if there are two consecutive message mis-predictions for the same block.

Implementation issues for Cosmos • Cosmos is a two-level adaptive predictor. • The first level containing the MHRs can be merged with the cache block state maintained at both directories and caches. • The second-level is challenging as it may require large amounts of memory. But statistically, it was found that the memory overhead for 128bytes cache blocks is less than 14% for an MHR depth of one

Integrating Cosmos with a coherence Protocol • Mapping Predictions to Actions. • Determining When to Perform Actions. • Detecting and Handling Mis-Predictions • Actions that move protocol between two “legal” states. • Actions that move the protocol state to a future state, but do not expose this state to the processor • Actions that allow both the processor and the protocol to move to future states.

Modeling the Performance • For the simplistic model the parameters are defined as below. • p – prediction accuracy for each message. • f – fraction of delay incurred on messages predicted correctly • r – penalty due to mis-predicted message. A crude execution model that translates coherence message prediction rates into a parallel program’s speedup.

Benchmarking the Cosmos Bench marks that were run Prediction accuracy for the various benchmarks

Analysis of the Results • Filters increase prediction accuracy slightly, but only for predictors with MHR depth of one. • Time to reach steady state prediction rates varies with the application. • Memory requirement of Cosmos Predictors is generally within 22%

Conclusions • Cosmos is less complex than including composition of predictors of several directed optimizations in a single protocol. • Cosmos can identify application specific patterns not known a priori • Cosmos has high accuracies of 80% and above for most applications. • Compared to other optimizations, Cosmos requires more hardware resource to store, access and update the MHT and PHT.

Thank You Questions ??

Using Prediction to Accelerate Coherence Protocols

Using Prediction to Accelerate Coherence Protocols

Presentation Transcript

Interconnect-Aware Coherence Protocols for Chip Multiprocessors

Interconnect-Aware Coherence Protocols for Chip Multiprocessors

Lecture 18: Coherence Protocols

Verification of cache-coherence protocols with TLA+

Using Collisions to improve Network protocols

A Systematic Methodology to Develop Resilient Cache Coherence Protocols

Lecture 2. Snoop-based Cache Coherence Protocols

Using systems thinking to accelerate change

A Compositional Approach to Verifying Hierarchical Cache Coherence Protocols

Cache Coherence Protocols

Dynamic Verification of Cache Coherence Protocols

Cache Coherence Simulation using GEMS

Using Prediction to Accelerate Coherence Protocols

Cache Coherence Protocols: Evaluation Using a Microprocessor Simulation Model

Lecture 3: Coherence Protocols

Using Gordon to Accelerate LHC Science

A Study on Snoop-Based Cache Coherence Protocols

Verification of cache-coherence protocols with TLA+