230 likes | 333 Views
Multiprocessor Cache Consistency. (or, what does volatile mean?) Andrew Whitaker CSE451. What Does This Program Print?. public class VisiblityExample extends Thread { private static int x = 1; private static int y = 1; private static boolean ready = false;
E N D
Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451
What Does This Program Print? publicclass VisiblityExample extends Thread { private staticint x = 1; private static int y = 1; private static boolean ready = false; publicstaticvoid main(String[] args) { Thread t = new VisiblityExample(); t.start(); x = 2; y = 2; ready = true; } publicvoid run() { while (! ready) Thread.yield(); // give up the processor System.out.println(“x= “ + x + “ y= “ + y); } }
Answer • It’s a race condition. Many different outputs are possible: • x=2, y=2 • x=1,y=2 • x=2,y=1 • x=1,y=1 • Or, the program may print nothing! • The ready loop runs forever
What’s Going on Here? • Processor caches ($) can get out-of-sync CPU CPU CPU CPU $ $ $ $ Memory
A Mental Model • Every thread/processor has its own copy of every variable • Yikes! // Not real code; for illustration purposes only publicclass Example extends Thread { private static final int NUM_PROCESSORS = 4; private staticint x[NUM_PROCESSORS]; private static int y[NUM_PROCESSORS]; private static boolean ready[NUM_PROCESSORS]; // …
Two Issues • Cache coherence • Do caches eventually converge on the same state • All modern caches are coherent • Cache consistency • When are operations by one processor visible on other processors? • Sometimes called “publication” • How much re-ordering is possible across processors?
Subjective View of Cache Consistency Strategies Relaxed Amount of reordering Strict Fast and scalable
Factors Pushing Towards Relaxed Consistency Models • Hardware perspective: consistency operations are expensive • Writing processor must invalidate all other processors • Reading processor must re-validate its cached state • Compiler perspective: optimizations frequently re-arrange memory operations to hide latency • These are guaranteed to be transparent, but only on a single processor
Caches 101 • Caches store blocks of main memory • Blocks are fairly small (perhaps 64 bytes) • Each cache block exists in one of three states • Invalid, shared, exclusive • Memory operations causes the cache block to change states • CPUs must communicate to implement cache block state changes
Reading processors Writing processor Cache Block State During a Coherence Operation Invalid Shared (read-only) Exclusive (read-write)
Some Terminology • Publication: A CPU announces its updates to some or all of cache memory • Fetch: A CPU loads that latest values for previously published updates
Hardware Support: Memory Fences (Barriers) • No memory operation can be moved across a fence • No operation after the fence appears before the fence • No operation before the fence appears after the fence • Several variants: • Write fences (for publication) • Read fences (for fetch) • Read/write (total) fences
Sequential Consistency • All writes are immediately published • All reads fetch the latest value • All processors agree on order of memory accesses • Every operation is a fence • Behaves like shuffling cards
A subset of legal orderings: A. x = 2; B. y = 3; C. x = 4; D. y = 5; C. x = 4; D. y = 5; A. x = 2; B. y = 3; C. x = 4; A. x = 2; D. y = 5; B. y = 3; A. x = 2; C. x = 4; D. y = 5; B. y = 3; Sequential Consistency Example Processor 1 Processor 2 x = 2; y = 3; x = 4; y = 5; A always appears before B C always appears before D
The Cost of Sequential Consistency • Every write requires a complete cache invalidation • Writing processor acquires exclusive access • Writing processor sends an invalidation message • Writing processor receives acknowledgements from all processors • Expensive!
Relaxed Consistency Models • Updates are published lazily • Therefore, updates may appear out-of-order • Challenge: Exposing a programming model that a human can understand
Release Consistency • Observation: concurrent programs usually use proper synchronization • “All shared, mutable state must be properly synchronized” • It suffices to sync-up memory during synchronized operations • Big performance win: the number of cache coherency operations scales with synchronization, not the number of loads and stores
Fetch current values Publish new values Simple Example • Within the critical section, updates can be re-ordered • Without publication, updates may neverbe visible synchronized (this) { x++; y++; }
Java Volatile Variables • Java synchronized does double-duty • It provides mutual exclusion, atomicity • It ensures safe publication of updates • Sometimes, we don’t want to pay the cost of mutual exclusion • Volatile variables provide safe publication without mutual exclusion volatile int x = 7;
More on Volatile • Updates to volatile fields are propagated immediately • “Don’t cache me!” • Effectively, this activates sequential consistency • Volatile serves as a fence to the compiler and hardware • Memory operations are not re-ordered around a volatile
Rule #1, Revised • All shared, mutable state must be properly synchronized • With a synchronized statement, an Atomic variable, or with volatile
Need synchronization to ensure publication Example: Lazy Initialization class Example { static List list = null; public static List getList () { if (list == null) { list = new LinkedList(); return list; } }