1 / 45

Memory Consistency

Memory Consistency. Zhonghai Lu, Axel Jantsch . Outline. Introduction What is a memory consistency model? Who should care? Memory consistency models Strict consistency Sequential consistency Relaxed consistency models Processor, weak ordering, release consistency Summary.

qabil
Download Presentation

Memory Consistency

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory Consistency Zhonghai Lu, Axel Jantsch SoC Architecture

  2. Outline • Introduction • What is a memory consistency model? • Who should care? • Memory consistency models • Strict consistency • Sequential consistency • Relaxed consistency models • Processor, weak ordering, release consistency • Summary SoC Architecture

  3. Shared memory architectures SoC Architecture

  4. Memory Consistency Model • Specifies constraints on the order in which memory operations (from any process) can appear to execute with respect to one another • What orders are preserved? • Given a load, constrain the possible values returned by it • Without it, can’t tell much about a Shared Memory based program’s execution SoC Architecture

  5. P P 1 2 /*Assume initial values of A and B are 0*/ (1a) A = 1; (2a) print B; (1b) B = 2; (2b) print A; Example of Orders • What’s the intuition? • Cache Coherence does not say, anything about the order between different variables A and B • Whatever it is, we need an ordering model for clear semantics • across different locations as well • so programmers can reason about what results are possible SoC Architecture

  6. Memory Consistency Model • Implications for both programmer and system designer • Programmer uses to reason about correctness and possible results • System designer can use to constrain how much accesses can be reordered by compiler or hardware • Contract between programmer and system SoC Architecture

  7. Many Consistency Models • Strict cinsistency (linearizability, or atomic consistency) • sequential consistency • causal consistency • release consistency • eventual consistency • delta consistency • PRAM consistency (also known as FIFO consistency) • weak consistency • vector-field consistency • fork consistency • one-copy Serializability • entry consistency SoC Architecture

  8. Goals of consistency mdoels • Programmability: Enables programmers to reason about the behavior and correctness of programs • Performance: Impose the ordering constraints that strike a good balance between programming complexity andperformance • Portability: Should be portable to different machines SoC Architecture

  9. Strict Consistency Model • Strict consistency • Any read to a memory location X returns the value stored by the most recent (last) write operation to X related to a global clock. • For uni-processors, ’last’ write follows the program order. What is ’last’ for multiprocessors? Assume that all variables initially have a value of 0. P1: W(x)1 P2: R(x)1 R(x)1 P1: W(x)1 P2: R(x)0 R(x)1 OK OK P1: W(x)1 P2: R(x)0 R(x)1 NO

  10. Sequential Consistency

  11. Sequential Consistency • “A multiprocessor is sequentially consistent if the result of any execution is the same as if • the operations of all the processors were executed in some sequential order, and • the operations of each individual processor appear in this sequence in the order specified by its program.” [Lamport, 1979] SoC Architecture

  12. Sequential Consistency • (as if there were no caches, and a single memory) • Total order achieved by interleaving accesses from different processes • Maintains program order, and memory operations, from all processes, appear to [issue, execute, complete] atomically w.r.t. others • Programmer’s intuition is maintained SoC Architecture

  13. SC example • Program order among operations from a single processor • Atomic execution of memory operations Initially Flag1=0; Flag2=0; P2 Flag2=1 If (Flag1==0) {Critical section} Initially A=0; B=0; C=0; P1 Flag1=1 If (Flag2==0) {Critical section} P1 A=1; P2 If (A==1) B=1; P3 If (B==1) C=A; None or one of the tasks enter the critical section C==1 if write is atomic SoC Architecture

  14. P P 1 2 /*Assume initial values of A and B are 0*/ (1a) A = 1; (2a) print B; (1b) B = 2; (2b) print A; SC Example • What matters is order in which the program appears to execute, • possible outcomes for (A,B): (0,0), (1,0), (1,2); impossible under SC: (0,2) • we know 1a->1b and 2a->2b by program order • A = 0 implies 2b->1a, which implies 2a->1b • B = 2 implies 1b->2a, which leads to a contradiction • actual execution 1b->2a->2b->1a is not SC SoC Architecture

  15. Discussion on SC • Sequential consistency model • Intuitive semantics to the programmer • Easily implementable by satisfying its sufficient conditions • Write completion • Write atomicity: writes visible to all processes. • Restricts many of performance optimizations with the hardware and compiler techniques. • Optimized hardware architectures without caches • Hardware architectures with caches • Compiler optimizations SoC Architecture

  16. Canonical hardware optimization (without caches) • Write buffer • General interconnect with multiple memory modules • Overlapping write operations • Non-blocking read operations SoC Architecture

  17. Write buffer • Write buffer • Write transaction is not complete until acknolwedged • On a write, a processor simply inserts the write operation into the write buffer and proceeds without waiting for the write to complete. • Subsequent reads are allowed to by pass any previous writes in the write buffer for faster completion. • Purpose: hide the latency of write operations • Write buffers are safe to use in a uniprocessor since bypassing between operations to different locations does not lead to a violation of uniprocessor data dependence. • What happens in a multiprocessor? SoC Architecture

  18. Write buffer • If write buffers are used, both reads of flag return 0, violating SC, the program order of Write2Read (to different locations). • Terms t1, t2, t3, t4 indicate the order in which the corresponding read/write operations execute at memory. SoC Architecture

  19. Overlapping writes • Allowing writes to different locations to be re-ordered is safe for uniprocessor programs. • What about multiprocessors? The write completion may be out of program order. • An example • Interconnection network allows concurrent transactions. • Multiple memory modules. • To explore the concurrency allowed by the network and memory, write to another location starts before the previous one is complete (acknowedged). SoC Architecture

  20. Overlapping writes • For P2, when Head=1, what is the value for Data? • Since no guarantee that the write to Data completes before the write to Head, no guarantee that Data = 2000, violating SC, the program order of Write2Write (to different locations). SoC Architecture

  21. Nonblocking read operations • Many processors do not stall for the return value of a read operation. They can proceed past a read operation by using techniques such as speculative execution, and dynamic scheduling. • Reads (Read2Read to different locations) complete out-of-program-order. • What does this mean for multiprocessors? SoC Architecture

  22. Nonblocking read operations • P2 reads Data before the updated Head, violating SC, the program order of Read2Read (to different locations). SoC Architecture

  23. Architectures with caches • More chance to reorder operations that can violate sequential consistency. • E.g. write-through cache has the similar behavior as write buffer. • Even if a read hits the cache, the processor cannot read the cached value until its previous operations by program order are complete!! • Additional issues: • Need cache coherence protocol to propagate (update, invalidate) a newly written value to all caches copies of the modified location. • Detecting when a write is complete needs more transactions. • Hard to make propagating to multiple copies atomic: more challenging to preserve the program order. SoC Architecture

  24. Detect the completion of write oprations • Suppose a write-through cache for P1 and P2 • P2 initially has Data in its cache • What if P2 reads Data from its cache after it sees Head=1, but before Data is updated ? • This can be avoided if P1 waits for P2’s cache copy of Data to be updated or invalidated before proceeding with the write to Head. SoC Architecture

  25. Maintain the illusion of atomicity for writes • All processors see writes to the same location in the same order, making writes appear atomic. • Example • A, B are cached • P3 and P4 may see the writes to A by P1 and P2 in a different order. result3 and result4 may get 1 and 2, respectively. This violates SC. Initially A = B = 0; P1 A = 1 P2 A = 2 B = 1 P3 while (B != 1) ; result3 = A P4 while (B != 1) ; result4 = A SoC Architecture

  26. Maintain the illusion of atomicity for writes • The value of a write not returned by a read until all invalidates are acknowledged. Otherwise, violates SC. • Example • A, B, C are cached • P2 sees A=1, • P3 sees B=1, but A=1 not be seen, register1=0, violating SC. SoC Architecture

  27. Compiler optimization • Re-order memory references similar to hardware-generated re-orderings • Register allocation example • If the compiler register allocates the location Head on P2 (by doing a single read of P2 and then reading the value within the register), the while loop may never terminate in some executions (if the single read on P2 returns the old value of Head). • This violates SC, because the loop is guaranteed to terminate in every sequentially consistent execution of the code. SoC Architecture

  28. Summary of SC • Sequential consistency requirements: • Program order requirement: a processor must ensure that its previous memory operation is complete before proceeding with the next memory operation in program order. • A write is complete only after all invalidates or updates are acknowledged. • Write atomicity requirement: • Writes to the same location made visible in the same order to all processors. • The value of a write should not be returned by a read until all invalidates are acknowledged. • These requirements make many hardware and compiler optimizations invalid. • Memory reference order must be strictly enforced. • Instruction scheduling, register allocation, etc SoC Architecture

  29. Relax the requirements • To improve performance, need to • Relax program order requirement • Read/write order for different addresses • Write2Read, Write2Write, Read2Read, Read2Write • Read/write order for the same address must always be enforced. • Relax write atomicity requirement. • Allow a read to return the value of another processor’s write before the write is complete (visible to all processors) • Relaxation related to program order and write atomicity • Allow a read to return the value of its own previous write before the write is complete. SoC Architecture

  30. Relaxed consistency models

  31. Relaxed consistency models • Relaxation • Relaxed models that relax all program orders • Processor consistency (PC) • Weak consistency (weak ordering, WC or WO) • Release consistency (RC) SoC Architecture

  32. Processor Consistency • Processor consistency (PC) • Writes done by a single processor are received by all other processors in the order in which they were issued, but writes from different processors may be seen in a different order by different processors • The basic idea • To better reflect the reality of networks in which the latency between different nodes can be different.

  33. Processor Consistency • Rules: 2 memory access conditions • On a given processor, before a read is allowed to perform all previous read accesses must be performed. • On a given processor, before a write is allowed to perform all previous read or write accesses must be performed. • Example P1: W(x)1 W(x)2 P2: R(x)2 R(x)1 NO P1 P2 P3 A = 1; While (A==0); B = 1; While (B==0); Print A; SC: print 1 PC: print 0 or 1

  34. Weak Consistency (WC) • Idea: Accesses to shared variables should be done within critical sections; exploit this fact • Memory accesses are distinguished as either data or sync operations. • Rules: 3 memory access conditions • All previous synchronization accesses must be performed beforea read or a write access is allowed. • All previous read and write accesses must be performed before a synchronization access is performed. • Synchronizationaccesses are sequentially consistent with respect to one another. • The WO model ensures that writes always appear atomic to the programmer.

  35. Implementing Weak consistency • Program: Identify/label memory accesses as data or sync operations. • Program construct(s) • Define a special data type • Compiler: translate the high-level intention to machine language: • Associate the type with a pariticular address (region) • Or, map the special type to, for example, a SYNC instruction, if the hardware provides such primitives. • Hardware support: • Each processor uses a counter to track outstanding transactions. SoC Architecture

  36. When should an operation be a sync operation? Race • Given a sequentially consistent execution, an operation forms a Race with another operation if • the two operations access the same location; • at least one of the operations is write; • there are no other intervening operations between the two operations • Example • The operations on Data are dataoperations, because the write andread of Data will always be separatedby the intervening operations of thewrite and read of Head. • The operations on Head are not always separated by other operations. Therefore, they are sync operations. SoC Architecture

  37. Programmer-centric view on Weak consistency Sync or Data? Given a memory location Don’t know Race? No Data operation Yes Sync operation SoC Architecture

  38. Hardware support for WC • Each processor uses a countertracking outstanding transactions • The counter is incremented when the processor issues an operation; is decremented when a previously issued operation completes; • Each processor must ensure that • A sync operation is not issued until all previous operations are complete, i.e., count = 0. • No operations are issued until the previous sync operation completes. (sync completion) • Note: memory operations between two sync operations may still be reordered and overlapped with respect to one another. SoC Architecture

  39. Example: valid ordering for release consistency Release Consistency (RC) • Idea: Extends weak consistency by considering lock (acquire) and unlock (release) operations on synchronization variable • Rules: 3 memory access conditions • Before a read or write operation on shared data is performed, all previous acquires done by the process must have completed successfully. • Before a release is allowed to be performed, all previous reads and writes by the process must have completed (flush writes) • Accesses to synchronization variables are FIFO consistent (sequential consistency is not required).

  40. Release Consistency • If all accesses to shared variables are surrounded by acquire and release operations, results are the same as with sequential consistency • Blocks of operations within the critical section are made atomic via acquire/release operations

  41. Weak vs Release Consistency read/write read/write 1 read/write read/write 1 Acquire(read) sync 2 read/write read/write 2 read/write read/write read/write read/write 3 Release(write) sync acquire  all; all  release; acquire/release  acquire/release read/write read/write 3 Weak consistency Release consistency

  42. Weak Consistency Release Consistency R/W after S wait for S R/W wait for Acq. Rel. waits for earlier R/W S waits for earlier writes S (Sync) Acq. (Lock) Rel. (Unlock) Release consistency: further relax synchronization constraints by distinguishing between Acquire (Lock) and Release (Unlock) operations Weak vs Release Consistency

  43. Summary of consistency models Strict Consistency: A read always returns with the most recent write to the same memory location Uniprocessors Multiprocessors Sequential Consistency: The result of any execution appears as the interleaving of individual programs strictly in sequential program order Processor Consistency: Writes issued by each processor are in program order, but writes from different processors can be out of order Weak Consistency: Programmer uses synchronization operators to enforce sequential consistency Release Consistency: Weak consistency with two synchronization operators: acquire and release. Each operator is guaranteed to be processor-consistent SoC Architecture

  44. Summary of consistency models A consistency model: what it is; what conditions and primitives to enforce; what order (relaxation) a processor sees; how does it differ from others.

  45. Conclusion • A memory consistency model is a contract between a shared memory machine with its programs • 3P: programmablity, performance and portability • Different consistency models exist. • They have subtle but important differences. • Different performance, overhead, hardware cost etc. • Programmer prefers an intuitive interface, like SC. SoC Architecture

More Related