1 / 50

Memory Models

Memory Models. In Software and in Hardware Practical Considerations. Agenda. Motivation Factors Levels of Memory Models Models for software: Java, CLI Models for hardware: IA-32, IA-64. MM Motivation and Factors. http://citeseer.nj.nec.com/adve95shared.html.

sandra_john
Download Presentation

Memory Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory Models In Software and in Hardware Practical Considerations

  2. Agenda • Motivation • Factors • Levels of Memory Models • Models for software: Java, CLI • Models for hardware: IA-32, IA-64

  3. MM Motivation and Factors http://citeseer.nj.nec.com/adve95shared.html

  4. Thread 1 Thread 2 Task t = new Task(); queue.insert(t); Task t = queue.get(); t.run(); MM Motivation • Multithreaded programming • Shared memory • An example: producer/consumer queue • Does it work correctly? • The program performs the operations in the correct order!

  5. Compiler VM Memory Model Levels Java MM, CLI MM, SC, Coherence, Release Consistency, etc. Programmer-Level Models Java Memory Model (Implementor View), Microsoft CLI Implementor-Level Models (Virtual Machine) IA-32, IA-64, Alpha, PowerPC, TSO, PSO, etc. Implementor-Level Models (Hardware)

  6. Factors that Affect MM • Compiler: performs optimizations • [Virtual Machine]: yet more optimizations • Processor: performs operations out of order • Memory subsystem: delivers updates out of order

  7. MM Factors: Compiler & VM • Compilers • Store values in registers • Reorder operations • Example int x = 0, answer = 0; void f() { int tmp1 = x; int tmp2 = answer; while (!tmp2) { tmp1 = tmp1+1; } x = tmp1; } Held in register all the time int x = 0, answer = 0; void f() { while (!answer) { x = x+1; } } No read from memory No write to memory

  8. MM Factors: Processor • Includes a lot of features that help it tolerate memory latency • Most of them change the order of memory operations • Examples • Out-of-order execution : The most important performance-enabler of modern processors • Write combining : Reads/writes to the same cache line • Read/write buffers • Many more

  9. MM Factors: Memory Subsystem • Hardware • Cache Coherence Protocols • Software • DSM Coherence Protocols

  10. Performance Transparency Sequential Consistency Any Order The Tradeoff The more optimizations are there in the system, the less transparent it is to the programmer

  11. Programmer View Models Java – Original specification Java – New specification Microsoft’s CLI (.NET) specification

  12. Java MM – Original Spec • Java Language Specification, Chapter 17http://java.sun.com/docs/books/jls/ • A. Gontmakher, A. Schuster, ACM TOCS, vol. 18, No. 4, pp. 333-386http://www.cs.technion.ac.il/~assaf/publications/java.ps • Defines an abstract virtual machine • Really hard to understand • Non-compliant implementation by SUN (!!!) • Many other problems

  13. Java MM: Motivation • Built-in synchronization • Modeled after monitors • Integrated with memory model • Performance: Avoid synchronization • Immutable objects

  14. Thread 1 Thread 2 Execution engine Execution engine use use assign assign Local memory Local memory load store load store Main memory read write read write Java MM: The Abstract Model

  15. Java MM: The Constraints Thread 1 read x,v  load x,v  use x,v assign x,v  store x,v  write x,v read x,v  load x,v write x,v  store x,v load x,v  use x,v store x,v  assign x,v… and more Execution engine Not always (Prescient Stores) use assign Local memory load store Main memory read write

  16. Java MM: Applying The Model x==1 y==1 y=1 x=1 read y,1 read x,1 load y,1 load x,1 use y,1 use x,1 assign x,1 assign y,1 store x,1 store y,1 write x,1 write y,1

  17. Java MM: How To Deal With • Determine the dependencies between use/assigns that follow from the constraints • Then, ignore all the operations except for use/assigns • Non-Operational Model!

  18. Programmer View (non-operational) use/ assign use/ assign Implementor View (non-operational) Programmer View (operational) load/ store load/ store Implementor View (operational) read/ write read/ write Java MM - Views

  19. Java MM: Characterizations • Java is stronger than Coherence • Proof below • Volatile variables: Sequential Consistency • Locks: variant of Release Consistency • Semantics of locks not SC or PC (and not stated explicitly at all).

  20. r/wx r/w x r x,v w y,w Java MM – Characterizations 2 • Full definition: regular variables • Based on Legal Serialization. Constraints: • Excludes Prescient Stores • Proof: 5+ pages Legend: Sees a value written by another thread Same Variable rule Transistor rule

  21. r/wx r/w x r x,v r y,1 r y,2 w y,w r x,v w y,1 r y,2 w y,w r x,v w y,2 w y,w Java MM – Characterizations 3 • Java: full definition (regular variables only) • Constraints: • Includes Prescient Stores • Proof: 20+ pages! • Coherence follows from the first Constraint Legend: Writes a value seen by another thread

  22. Java MM – Coherence Proof 1:Java is not weaker than Coherence • Take operations for variable Xfrom all threads. • Divide each thread into blocks: load-block:load (use)* store-block:assign (use|assign) store (use)* • Each block: oneload/storeoperation. • Sort the blocks by their memory accesses. • Result: legal serialization of use/assigns to X.

  23. Java MM – Coherence Proof 2:Java is stronger than Coherence • Coherence: easily shown • Java (without Prescient Stores): • Transistor Rule: 1.1  1.2, 2.1  2.2 • Legal Serialization: 2.2  1.1, 2.1  1.2 • Cycle of dependencies! Thread 1 Thread 2 1 use x,1 1 use y,1 2 assign y,1 2 assign x,1

  24. Thread 1 read x,1 read y,0 read y,2 write y,1 Thread 2 read y,1 read x,0 read x,2 write x,1 Thread 3 write x,2 write y,2 Java MM – Coherence Proof 3Prescient Stores • A store can move presciently up • Before its corresponding assign • But not before another load/store • The previous execution now valid • But it can still be fixed… Necessarily has a load The store, even prescient, now cannot move up

  25. Java MM: Conclusions • Programming with Locks: easy • Programming with volatile variables: easy • Programming with regular variables: • Using just Coherence – OK • Using full definition – hard • Really accounting for Prescient Stores - nightmare

  26. New Java MM In process, by Bill Pugh et. al. http://www.javasoft.com/aboutJava/communityprocess/jsr/jsr_133.html http://www.cs.umd.edu/~pugh/java/memoryModel/semantics.pdf

  27. New Java VM: Motivation • Correctly synchronized programs must have SC semantics • Incorrectly synchronized programs must have (safe) semantics • Safety: JVM must never fail • Security: Prevent attacks based on unsynchronized code

  28. New Java MM: Requirements • Backward Compatibility • No new language constructs • No new VM instructions • No system-specific artifacts, e.g. garbage collection • Clear Distinction between compiler and VM • No optimizations in the compiler • Thus, VM model is the same as the one visible to the programmer • Implementability • No unrealistic requirements on software or hardware

  29. New Java VM: The Approach • Exact semantics for all memory accesses • Not really relevant • Except that SC for Properly Labelled (no data races) programs can be shown • Semantics for support of established idioms • Final fields • Volatile variables • Locks • Quite practical

  30. New Semantics of FinalImmutable objects • Many objects in Java are designed to be immutable • Rationale: avoiding synchronization • Best known example – java.lang.String • The problem: String not really immutable • Can see writes to the buffer, but not to the length and offset! • Security hole

  31. New Semantics of FinalFixing immutable objects • Solution 1: Make ALL String methods synchronized • Serious hit at performance • Not needed on single-processor machines • Solution 2: Extending semantics of final fields • Access that reads a final field, sees it initialized • An object must not escape the constructor • Problem: String: array elements cannot be final • “weak acquire semantics”: reads dependent on the final field are seen initialized too

  32. New Semantics for Volatile • Previously: Sequential Consistency • But: no relation with the regular operations • Not really useful for synchronization (recall the producer/consumer example) • Now: Acquire/Release Semantics • Read works as Acquire • Write works as Release

  33. New Semantics of VolatileDouble-Checked Locking • An object s must be created first time it is requested synchronized(s) { if (s==null) s = new S(); } • Slow! Locking on each access • Double-Checking: if (s==null) { synchronized(this) if (s==null) s = new S(); } • The reader can reorder access to s and to its fields • But, if s is volatile, it works!

  34. New Semantics of VolatileAdvanced Double-Checking static volatile boolean initialized = false; if (!initialized) { synchronized(this) { if (!initialized) { s1 = new S(); s1.connect(…); initialized = true; }}} Final fields won’t help

  35. New Semantics of Locks • Only locks on the same variable have acquire/release semantics • Simplifies implementation • Different locks do not synchronize anyway, so no need for acquire • In original spec, each lock is a memory barrier • Even synchronized(new Object()) {} • Compiler cannot safely remove locks • In the new semantics, recursive locks are no-op

  36. CLI Memory Model The VM for Microsoft’s .NET http://www.ecma.ch/ecma1/STAND/ecma-335.htm Standard ECMA-335, Common Language Infrastructure Chapter 11.6, Memory Model and Optimizations

  37. CLI Memory Model • So Short!!! Just 4 pages • The system • Flat shared memory • Threads access the same memory • Any reordering of operations is permitted • Except volatile reads/writes • Except synchronous exceptions • Atomic access defined for some operations • Threading APIs define synchronization semantics

  38. CLI: Volatile Consistency • Volatile reads and writes • Accesses to volatile variables • Explicit methods: Thread.VolatileRead, Thread.VolatileWrite • Thread.MemoryBarrier – same as both VolatileRead and VolatileWrite • Volatile read – acquire semantics, volatile write – release semantics • Different threads can see different orders of volatile writes of different threads

  39. CLI: Locks • Usual locking semantics: obtaining and releasing locks • Synchronized methods • System.Threading.Monitor class – simulates C.A.R. Hoare’s monitor (only tries to; simulation is no more complete than in Java) • Acquiring lock has acquire semantics, releasing – release semantics

  40. CLI: Atomic Memory Accesses • Word-length accesses, aligned 4-byte accesses are atomic • System.Threading.Interlocked: atomic read-modify-write operations • Increment, Decrement, Exchange, CompareExchange • One and Two-byte reads are atomic. Byte writes may write the whole word

  41. Conclusions: Using CLI • All concurrent accesses might be synchronized using synchronized methods or Monitor class • Volatile variables: no common order. Probably usable in the simplest cases • Designed for accessing hardware registers. There it fits • Atomic memory access: no memory barrier semantics • Probably just forgotten • Useful in some simple cases

  42. Conclusions: Implementing CLI • Lots of disclaimers in the spec – no unimplementable requirements. Thus, implementation is straightforward • For instance, Alpha has no instruction to write a byte – implementation of atomic write would be problematic. Java has this problem • From the other hand, all low-level mechanisms are present (Interlocked)

  43. Conclusions: JVM vs. CLI • Similar semantics for locks • Except that in Java, nested locks are no-op, thus locks can be eliminated by the compiler • In Java, acquire/release happens only if synchronizing on same lock object. In CLI – full acquire/release. • Similar semantics for volatiles • Except that volatiles consistency is weaker. It is unclear if the Double Checked Locking idiom should work • Similarly unusable semantics for regular variables • Except for Java’s provisions for object construction (semantics of volatile fields) • Adds low-level interlocked accesses

  44. Hardware Memory Models IA-64 and IA-32

  45. IA-32 • Memory reads: acquire semantics • Except that reads can see local writes early; see below • Memory writes: release semantics • Except that there is no global order of writes; see below • Interlocked memory accesses: using processor lock prefix

  46. IA-64: Memory Accesses • Regular memory accesses – unordered • Attributes to memory accesses: release or acquire • Acquire: ld.acq instruction • Release: st.rel instruction • Memory Fence (mf) • AKA Memory Barrier, is both acquire and release.

  47. IA-64: Atomic Accesses • CMPXCHG (Compare and Exchange) • Compare memory with a given value. Exchange if not equal • Can have either acquire (cmpxchg.acq) or release (cmpxchg.rel) semantics • FAA (fetch and add) • Also acquire or release semantics • XCHG (Exchange) • Only acquire semantics

  48. IA-64: Semantics of ld.acq, st.rel Execution order Program order • Constraints: • Acquire >> X  Acquire  X • X >> Release  X  Release • Fence >> X  Fence  X • X >> Fence  X  Fence • Global order of all the strong write operations T1 T2 T3 T4 st.rel [x]=1 ld.acq r1=[x] st.rel [y]=1 ld.acq r3=[y] ld r2=[y] ld r4=[x] Forbidden: r1=1, r3=1, r2=0, r4=0

  49. IA-64 Semantics: Exceptions • Load may see value from store buffer • Inserting mf between st.rel and ld.acq solves the problem • But: in Java semantics, this execution is OK! T1 T2 st.rel [x]=1 st.rel [y]=1 ld.acq r1=[x] ld.acq r3=[y] ld r2=[y] ld r4=[x] Permitted: r1=1, r3=1, r2=0, r4=0

  50. IA-64 Semantics: Conclusion • Simple. Clean • Very usable: direct mapping to both Java and CLI memory models • Especially fits the new Java Memory Model (or more reasonably, the new Java Memory Model especially fits IA-64 ;) • IA-32: Obviously developed before MP systems became common (for Intel processors) • Cannot change architecture now

More Related