1 / 23

ECE729 : Advance Computer Architecture

ECE729 : Advance Computer Architecture. Lecture 26: Synchronization, Memory Consistency 25 th March, 2010. Synchronization Problem. Processes run on different processors independently At some point they need to know the status of each other for communication mutual exclusion etc

marlie
Download Presentation

ECE729 : Advance Computer Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE729 : Advance Computer Architecture Lecture 26: Synchronization, Memory Consistency 25th March, 2010 Anshul Kumar, CSE IITD

  2. Synchronization Problem • Processes run on different processors independently • At some point they need to know the status of each other for • communication • mutual exclusion etc • Hardware primitives required for these operations Anshul Kumar, CSE IITD

  3. Consider an example Bank transaction from account number A : • b = read_bal (A) • b = b – debit_amt • if b >= bmin update_bal (A, b) Anshul Kumar, CSE IITD

  4. Transaction 1 : b1 = read_bal (A) b1 = b1 – debit_amt1 if b1 >= bmin update_bal (A, b1) Transaction 2 : b2 = read_bal (A) b2 = b2 – debit_amt2 if b2 >= bmin update_bal (A, b2) Two concurrent transactions Anshul Kumar, CSE IITD

  5. serialize reads Transaction 1 : b1 = read_bal (A) b1 = b1 – debit_amt1 if b1 >= bmin update_bal (A, b1) and writes Transaction 2 : b2 = read_bal (A) b2 = b2 – debit_amt2 if b2 >= bmin update_bal (A, b2) Two concurrent transactions Anshul Kumar, CSE IITD

  6. Transaction 1 : aquire: x1 = read (lock) if x1 = 1 then goto aquire set (lock) do transaction1 … … release: clear (lock) Transaction 2 : aquire: x2 = read (lock) if x2 = 1 then goto aquire set (lock) do transaction2 … … release: clear (lock) Lock for mutual exclusion Transaction 1 : aquire: x1 = read (lock) if x1 = 1 then goto aquire set (lock) do transaction1 … … release: clear (lock) Transaction 2 : aquire: x2 = read (lock) if x2 = 1 then goto aquire set (lock) do transaction2 … … release: clear (lock) Anshul Kumar, CSE IITD

  7. Transaction 1 : aquire: x1 = read (lock) if x1 = 1 then goto aquire set (lock) do transaction1 … … release: clear (lock) Transaction 2 : aquire: x2 = read (lock) if x2 = 1 then goto aquire set (lock) do transaction2 … … release: clear (lock) Lock for mutual exclusion Transaction 1 : aquire: x1 = read (lock) if x1 = 1 then goto aquire set (lock) do transaction1 … … release: clear (lock) Transaction 2 : aquire: x2 = read (lock) if x2 = 1 then goto aquire set (lock) do transaction2 … … release: clear (lock) Anshul Kumar, CSE IITD

  8. Synchronization Primitives Hardware primitive required • Should have atomic read+write operation • Examples: • test&set • exchange • fetch&increment • load linked, store contitional Anshul Kumar, CSE IITD

  9. Spin Lock with Exchange Instr. Lock: 0 indicates free and 1 indicates locked Code to lock X : r2  1 lockit: r2  X ;atomic exchange if(r20)lockit ;already locked locks are cached for efficiency, coherence is used Better code to lock X : lockit: r2  X ;read lock if(r20)lockit ;not available r2  1 r2  X ;atomic exchange if(r20)lockit ;already locked

  10. LD Linked & ST conditional Simpler to implement • atomic exchange r2  X using LL and SC try: r3  r2 ;move exchange value LL r1, X ;load linked SC r3, X ;store conditional if(r3=0)try ;branch, store fails r2  r1 ;put loaded value in r2 • fetch&increment using LL and SC try: LL r1, X ;load locked r3  r1 + 1 ;increment SC r3, X ;store conditional if(r3=0)try ;branch, store fails Anshul Kumar, CSE IITD

  11. Spin Lock with LL & SC lockit: LL r2, X ;load locked if(r20)lockit ;not available r2  1 SC r2, X ;store cond if(r2=0)lockit ;branch store fails • performance in presence of contention? • spin lock with exponential back-off reduces contention Anshul Kumar, CSE IITD

  12. Barrier Synchronization 1 lock (X) if(count=0)release  0 count++ unlock(X) if(count=total){count0;release1} else spin(release=1) 0 Anshul Kumar, CSE IITD

  13. Improved Barrier Synch. local_sense  !local_sense lock (X) count++ unlock(X) if(count = total) {count0;releaselocal_sense} else {spin(release = local_sense)} tree based barrier reduces contention Anshul Kumar, CSE IITD

  14. Memory Consistency Problem • When must a processor see the value that has been written by another processor? Atomicity of operations – system wide? • Can memory operations be re-ordered? Various models : http://rsim.cs.uiuc.edu/~sadve/Publications/ models_tutorial.ps Anshul Kumar, CSE IITD

  15. Example P1: A = 0 P2: B = 0 ... ... A = 1 B = 1 L1: if(B=0)S1 L2: if(A=0)S2 Which statements among S1 and S2 are done? Both S1, S2 may be done if writes are delayed Anshul Kumar, CSE IITD

  16. Sequential Consistency • result of any execution is same as if the operations of all processors were executed in some sequential order • operations of each processor occur in the order specified by its program - it requires all memory operations to be atomic - too restrictive, high overheads Anshul Kumar, CSE IITD

  17. Relaxing WR order Loads are allowed to overtake stores Write buffering is permitted • Total Store Ordering : Writes are atomic • Processor Consistency : Writes need not be atomic - Invalidations may gradually propagate Anshul Kumar, CSE IITD

  18. Relaxing WR & WW order Partial Store Ordering • Loads are allowed to overtake stores • Writes can be re-ordered • Memory barrier or fence are used to explicitly order any operations Further improves the performance Anshul Kumar, CSE IITD

  19. P1P2 A = 1; while(flag=0); flag = 1; print A; P1P2 A = 1; print B; B = 1; print A; SC ensures that “1” is printed TSO, PC also do so PSO does not SC ensures that if B is printed as “1” then A is also printed as “1” TSO, PC also do so PSO does not Examples Anshul Kumar, CSE IITD

  20. Examples - continued P1P2P3 A = 1; while(A=0); while(B=0); B = 1; print A; SC ensures that “1” is printed. TSO and PSO also do that but PC does not P1P2 A = 1; B = 1; print B; print A; SC ensures that both can’t be printed as “0”. TSO, PC and PSO do not Anshul Kumar, CSE IITD

  21. Relaxing all R/W order Weak Ordering or Weak Consistency • Loads and Stores are not restricted to follow an order • Explicit synchronization primitives are used • Synchronization primitives follow a strict order • Easy to achieve • Low overhead Anshul Kumar, CSE IITD

  22. Release Consistency • Further relaxation of weak ordering • Synch primitives are divided into aquire and release operations • R/W operations after an aquire cannot move before it but those before it can be moved after • R/W operations before a release cannot move after it but those after it can be moved before Anshul Kumar, CSE IITD

  23. WC and RC Comparison WC RC R/W … R/W R/W … R/W 1 1 synch aquire R/W … R/W R/W … R/W 2 2 synch release R/W … R/W R/W … R/W 3 3 Anshul Kumar, CSE IITD

More Related