320 likes | 1.3k Views
Cache Coherence Protocols in Shared Memory Multiprocessors. Mehmet Şenvar. Outline. Introduction Background Information The cache coherence problem Cahce Enforcement Strategies Consistency models Simple Solutions Hardware Protocols Snooping protocols Directory-based protocols
E N D
Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar Cache Coherence Protocols
Outline • Introduction • Background Information • The cache coherence problem • Cahce Enforcement Strategies • Consistency models • Simple Solutions • Hardware Protocols • Snooping protocols • Directory-based protocols • Compiler and Software protocols • Future work and conclusions Cache Coherence Protocols
The Cache Coherence Problem • Caches allow greater performance by storing frequently used data in faster memory • Since all processors share the same address space, it is possible for more than one processor to cache an address (or data item) at a time • If one processor updates the data item without informing the other processor, inconsistencies may result and cause incorrect executions Cache Coherence Protocols
Cache Coherence Problem Cache Coherence Protocols
Cache Coherence (cont.) • For correct execution, coherence must be enforced between the caches • Two major factors are: • performance • implementation cost • Four primary design issues are: • coherence detection strategy • coherence enforcement strategy • precision of block-sharing information • cache block size Cache Coherence Protocols
Cache Enforcement Strategies • A cache enforcement strategy is the mechanism which makes caches consistent • write-update (WU) • write-invalidate (WI) • hybrid protocols, competitive-update (CU) • Performance of WU and WI vary depending on the application and the number of writes • Hybrid protocols switch between WU and WI based on the # of writes to a block Cache Coherence Protocols
Consistency Models • A consistency model defines how the consistency of data values is maintained • Some consistency models are: • sequential consistency • weak consistency • release consistency • Weak consistency models are more efficient to implement and require fewer coherence messages Cache Coherence Protocols
Shared Caches (1) Processors share a single cache, essentially punting the problem. • Useful for very small machines. • E.g., DPC in the Encore, Alliant FX/8. • Problems are limited cache bandwidth and cache interference • Benefits are fine-grain sharing and prefetch effects Cache Coherence Protocols
Non-cacheable Items (2) • Make shared data non-cacheable • One of the simplest software solution • Also at hardware, make cache locations unreachable Cache Coherence Protocols
Broadcast Writes (3) • Every cache write request is sent to all other caches • Firstly need to discover whether each cache hold this data • Other copies are either updated or invalidated • Significant additional memory transactions occur Cache Coherence Protocols
Hardware Protocols • Snoop Bus Mechanism • Directory Based Methods • Full Directory • Limited Directory • Chained Directory Cache Coherence Protocols
Snoop Bus Protocol • Snooping protocols rely on a shared bus between the processors for coherence • On a processor write, the write is passed through the cache to main memory on the bus • Any processor caching the address may update or invalidate its cache entry as appropriate • Snooping protocols do not scale well beyond 32 processors because of the shared bus • The choice between WU, WI, and CU is especially important to reduce communication Cache Coherence Protocols
MESI (4-state) Invalidation Protocol • Each line in the cache can be in one of 4 states • Modifed (exclusive) : only in 1 cache, modified • Exclusive (unmodified) : only in 1 cache, unmodified • Shared (unmodified) • Invalid Cache Coherence Protocols
MESI State Transition Diagram Cache Coherence Protocols
MESI Example Cache Coherence Protocols
Directory-Based Protocols • Directory-based protocols do not rely on a shared bus to exchange coherence information (use point-to-point connections) • more scaleable (can have hundreds of processors) • each processor can have its own memory • implement weak consistency for efficiency Cache Coherence Protocols
Directory-Based Protocols (cont.) • Each node maintains a directory storing cache information and memory information • A processor communicates with the directory to access memory • if a processor requests a non-local memory page, the directory uses its information to find the page • Then, it uses messages to retrieve the page and insure all other processors have consistent info. • Since the directory maintains which processors are caching the page, it only needs to send messages to those processors Cache Coherence Protocols
Directory-Based Protocols (cont.) • Designing a directory requires defining: • cache block granularity • cache controller design • directory structure • Cache block granularity is the size of the cache and the size of a cache line • CC-NUMA machines have a separate, smaller cache from main memory • COMA machines use node’s entire memory as cache for remote pages • Block size affects performance (false sharing) Cache Coherence Protocols
Directory-Based Protocols (cont.) • Cache controller is hardware that maintains the directory and processes memory requests • custom hardware • programmable protocol processor • The directory structure is how the cache and memory information is organized • p+1-bit full directory • linked-list directories • tagged directories Cache Coherence Protocols
Directory Models • Full Directory • Link to all caches for all shared locations • Limited Directory • To some caches having shared data, n < N • Chained (linked)Directory • To one chache, form ths cache to others, single/double link Cache Coherence Protocols
Directory Sample (full) Cache Coherence Protocols
Lock-Based Protocols • New work that promises to be more scaleable than directory protocols • Implements scope consistency which is similar to lazy release consistency • Coherence information exchanged by reading and writing notices from the lock which protects the shared memory • Currently, implemented in software similar to DSM, but may move to hardware if performance gains can be realized Cache Coherence Protocols
Software Protocols • Software protocols enforce consistency with limited hardware support by relying either on the compiler or specialized software handlers • Similar to distributed shared memory (DSM) systems but at a lower level • sharing usually in blocks not pages • needs to be more efficient for better performance • architecture support for sharing Cache Coherence Protocols
Classification of Software Protocols • Several criteria distinguish software protocols: • dynamism - compile-time or run-time analysis • selectivity - level of coherence actions • restrictiveness - conservative or as-needed consistency enforcement • adaptivity - can protocol adapt to access patterns • granularity - size and structure of coherence data • blocking - program block on which coherence is enforced • positioning - position of coherence instructions • updating - how memory is updated after a write • checking - how incoherence is detected Cache Coherence Protocols
Software Coherence with Limited Hardware Support • Compiler must generate consistent code as no hardware coherence provided • Hardware maintains time tags which are updated on every write • On a read, compiler generates coherence reads which check time tags to insure data is consistent • Relies on the compiler to detect read which may be inconsistent, and the hardware must maintain these time tags • Using tags, it is also possible to perform dynamic self-invalidation of blocks • Many techniques based on using these time tags Cache Coherence Protocols
Software Coherence with Limited Hardware Support (cont.) • If hardware has no time tags, Petersen and Li developed an algorithm which uses only page translation hardware and page status tables • Sharing information is maintained by a software handler at the page-level • On a page access or fault, the software handler checks the sharing information, updates page tables, and performs coherence actions • Slower than hardware as software handlers involve the OS and are on the critical memory access path Cache Coherence Protocols
Enforcing Coherence by Restricting Parallelism • Compilers can also guarantee coherence by structuring the language to limit parallelism • easier to enforce coherence • limits the programmer and potential parallelism • simplifies compiler design • good performance can be achieved with no hardware support • Parallel language restrictions include: • doall parallel loops • master/slave processes Cache Coherence Protocols
Optimizing Compilers • Optimizing compilers are designed to maintain coherence with limited hardware support without overly restricting the programmer • rely on detecting data dependencies • may use synchronization variables (locks, barriers) • can provide the hardware with hints • can detect when coherence is not needed • may have problems with dynamic sharing • offer good performance, but are hard to design Cache Coherence Protocols
Future Work • Hardware protocols are well defined, and the directory structure is near optimal • Cost improvements can be obtained by mass producing cache controller chips • Software protocols are a good area for future research because they are also applicable at higher-levels of sharing (DSM, databases, ...) • Optimizing compilers need to be improved to detect data dependencies and optimize code for the parallel environment Cache Coherence Protocols
Conclusions • Hardware protocols offer the best performance but require high hardware costs • Software protocols can be used when there is no hardware support with a slight performance penalty • Optimizing compilers can enforce coherence or provide hints to the hardware • A combination of hardware and compiler optimizations is the best Cache Coherence Protocols