1k likes | 1.15k Views
By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson. Shared Memory Consistency Models: A Tutorial. Outline. Concurrent programming on a uniprocessor The effect of o ptimizations on a uniprocessor The e ffect of the same optimizations on a multiprocessor
E N D
By SaritaAdve & KouroshGharachorloo Slides by Jim Larson Shared Memory Consistency Models:A Tutorial
Outline • Concurrent programming on a uniprocessor • The effect of optimizations on a uniprocessor • The effect of the same optimizations on a multiprocessor • Methods for restoring sequential consistency • Conclusion
Outline • Concurrent programming on a uniprocessor • The effect of optimizations on a uniprocessor • The effect of the same optimizations on a multiprocessor • Methods for restoring sequential consistency • Conclusion
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag2 = 0
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 0
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 0
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 0
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 1
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 1
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 1 Critical Section is Protected Works the same if Process 2 runs first! Process 2 enters its Critical Section
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 0 Arbitrary interleaving of Processes
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 1 Arbitrary interleaving of Processes
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 1 Both processes can block but the critical section remains protected. Deadlock can be fixed by extending the algorithm with turn-taking
Outline • Concurrent Programming on a Uniprocessor • The effect of optimizations on a Uniprocessor • The effect of the same optimizations on a Multiprocessor without Sequential Consistency • Methods for restoring Sequential Consistency • Conclusion
SpeedUp: Write takes 100 cycles, buffering takes 1 cycle. So Buffer and keep going. Problem: Read from a Location with a buffered Write pending?? (Single Processor Case) Optimization: Write Buffer with Bypass
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 0 Write Buffering
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag2 = 1 Flag1 = 1 Flag2 = 0 Write Buffering
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag2 = 1 Flag1 = 1 Flag2 = 0 Write Buffering Uh-Oh!
SpeedUp: Write takes 100 cycles, buffering takes 1 cycle. Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete. Optimization: Write Buffer with Bypass
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section STALL! Flag1 = 0 Flag2 = 1 Flag1 = 1 Flag2 = 0 Write Buffering Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 1 Flag2 = 0 Write Buffering Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag2 = 0 Does this work for Multiprocessors??
Outline • Concurrent programming on a uniprocessor • The effect of optimizations on a uniprocessor • The effect of the same optimizations on a multiprocessor • Methods for restoring sequential consistency • Conclusion
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag2 = 0 Does this work for Multiprocessors?
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 1 Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 1 Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 1 What Now?? Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 1 Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 1 How Did That Happen?? Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 1 Flag2 = 0 Processor 2 knows nothing about the write to Flag1, so has no reason to stall! Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
A more general way to look at the Problem: Reordering of Reads and Writes (Loads and Stores).
Consider the Instructions in these processes. Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Simplify as: WX WY RX RY
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 RY RY WY RX WY RX WX WX WX WX WX WX WY RX RY RY RX WY WY RX RY RY RX WY RX WY RX WY RY RY RX WY RX WY RY RY RX WY RX WY RY RY WX WX WX WX WX WX WX WX WX WX WX WX RY RY WY RX WY RX RY RY WY RX WY RX RY RY WY RX WY RX WY RX RY RY RX WY WY RX RY RY RX WY WX WX WX WX WX WX RX WY RX WY RY RY There are 4! or 24 possible orderings. If either WX<RX or WY<RY Then the Critical Section is protected (Correct Behavior).
WY RX RY RY RX WY WY RX RY RY RX WY WX WX WX WX WX WX RX WY RX WY RY RY 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 WX WX WX WX WX WX RY RY WY RX WY RX RY RY WY RX WY RX RY RY WY RX WY RX RY RY WY RX WY RX WX WX WX WX WX WX WY RX RY RY RX WY WY RX RY RY RX WY RX WY RX WY RY RY RX WY RX WY RY RY RX WY RX WY RY RY WX WX WX WX WX WX There are 4! or 24 possible orderings. If either WX<RX or WY<RY Then the Critical Section is protected (Correct Behavior) 18 of the 24 orderings are OK. But the other 6 are trouble!
Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Head = 0 Data = 0 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.
Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Data = 2000 Head = 0 Data = 0 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.
Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Head = 1 Data = 2000 Head = 0 Data = 0 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.
Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Data = 2000 Head = 1 Data = 0 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.
Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Data = 2000 Head = 1 Data = 0 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.
Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Data = 2000 Wrong Data! Head = 1 Data = 0 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.
Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Head = 1 Data = 2000 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate. Fix: Write must be acknowledged before another write (or read) from the same processor.
Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Head = 0 Data = 0 Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.
Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Head = 0 Data = 0 Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.
Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Head = 0 Data = 0 Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.
Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data (0) Memory Interconnect Head = 0 Data = 0 Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.
Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data (0) Memory Interconnect Head = 0 Data = 2000 Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.