200 likes | 372 Views
Using Prediction to Accelerate Coherence Protocols. Authors : Shubendu S. Mukherjee and Mark D. Hill 1998. Proceedings. The 25th Annual International Symposium on Computer Architecture Publication Date: 27 Jun-1 Jul 1998 On page(s): 179-190. Presenter : Naresh Sukumar. Motivation.
E N D
Using Prediction to Accelerate Coherence Protocols Authors : Shubendu S. Mukherjee and Mark D. Hill 1998. Proceedings. The 25th Annual International Symposium on Computer Architecture Publication Date: 27 Jun-1 Jul 1998 On page(s): 179-190 Presenter : Naresh Sukumar
Motivation • In multi processors using directory protocols, some memory references suffer long latencies for misses to remotely-cached blocks. • To ameliorate this latency, standard coherence protocols have been augmented with optimizations for specific sharing patterns (eg. Read-modify-write, producer-consumer and migratory sharing • This paper aims to create a general prediction logic that adapts to the actual patterns encountered during operation.
What will be covered ? • Introduction to the directory protocol • General behavior of a predictor. • The Cosmos coherence message predictor. • Integrating Cosmos with a coherence Protocol. • Benchmarking the Cosmos • Analysis of the Results • Conclusions
Introduction to the Directory Protocol • Preferred method of cache coherence in large-scale shared-memory multiprocessors. • This protocol associates state with both caches and memory at the granularity of a cache block. • To simplify discussion, this paper considers a full-map and write-invalidate directory protocol. A sample of coherence messages usually found in full-map, write-invalidate coherence protocols.
Disadvantages • It often incurs multiple long-latency operations. • A directory may need to exchange messages with other caches before it can respond to a processor's request for a memory block. A store action to a block residing in another node’s cache
General Behavior of a Predictor • Predictors predict future sharing patterns and take actions to overlap coherence message activity with current work. • Types: • Read-modify-write • Pair-wise sharing • Dynamic self-invalidation • Migratory protocols • Predictors would sit beside each standard directory and cache module to monitor coherence activity and request appropriate actions.
The Cosmos coherence message Predictor • Signature patterns • Basic structure of Cosmos • Updating Cosmos • Adaptability to a complex signature • Filtering Noise • Implementation issues for Cosmos.
Signature patterns Sequence of message signatures by the producer cache, consumer cache and directory. In a slightly more complicated example, we can have two consumers sending a get_ro_request. It can be seen later that the order in which they arrive does not matter.
Basic Structure of Cosmos Logic structure of the Cosmos coherence message predictor • Two important things required: • Address of cache blocks – As patterns may be different for different cache blocks. • History of messages for a cache block.
Basic Structure of Cosmos contd… MHT – Message History Table PHT – Pattern History Table Obtaining a Prediction from Cosmos
Updating Cosmos • Index into MHR table with address of a cache block • Use the entry in MHR to index into the corresponding PHT. • Write new <sender, type> tuple as new prediction for the index corresponding to the MHR entry. • Left shift the <sender, type> tuple into the MHR for the cache block.
Adaptability to a complex signature Cosmos can adapt to complex message streams. For a scenario where the directory receives messages from two or three consumers, the Cosmos can adapt itself making itself immune to the order of arrival of the messages.
Filtering Noise • For ex. If 99% of the time, message B follows message A, then on seeing message A, Cosmos will predict the next message to be B. • The prediction should not change if rarely, these messages arrive in the sequence A, C, B instead of A, B. • Use counter and update the prediction only if there are two consecutive message mis-predictions for the same block.
Implementation issues for Cosmos • Cosmos is a two-level adaptive predictor. • The first level containing the MHRs can be merged with the cache block state maintained at both directories and caches. • The second-level is challenging as it may require large amounts of memory. But statistically, it was found that the memory overhead for 128bytes cache blocks is less than 14% for an MHR depth of one
Integrating Cosmos with a coherence Protocol • Mapping Predictions to Actions. • Determining When to Perform Actions. • Detecting and Handling Mis-Predictions • Actions that move protocol between two “legal” states. • Actions that move the protocol state to a future state, but do not expose this state to the processor • Actions that allow both the processor and the protocol to move to future states.
Modeling the Performance • For the simplistic model the parameters are defined as below. • p – prediction accuracy for each message. • f – fraction of delay incurred on messages predicted correctly • r – penalty due to mis-predicted message. A crude execution model that translates coherence message prediction rates into a parallel program’s speedup.
Benchmarking the Cosmos Bench marks that were run Prediction accuracy for the various benchmarks
Analysis of the Results • Filters increase prediction accuracy slightly, but only for predictors with MHR depth of one. • Time to reach steady state prediction rates varies with the application. • Memory requirement of Cosmos Predictors is generally within 22%
Conclusions • Cosmos is less complex than including composition of predictors of several directed optimizations in a single protocol. • Cosmos can identify application specific patterns not known a priori • Cosmos has high accuracies of 80% and above for most applications. • Compared to other optimizations, Cosmos requires more hardware resource to store, access and update the MHT and PHT.
Thank You Questions ??