290 likes | 311 Views
Explore how adaptive locks merge the benefits of locks and transactional memory, optimizing concurrency in multi-threaded programs. Learn about the rules, costs, and benefits of this synchronization mechanism.
E N D
Adaptive Locks: Combining Transactions and Locks for efficient Concurrency Takayuki Usui et all
Introduction. • Computing is more multi processor oriented. • Explicit multi threading is the most direct way to program parallel system (monitor style programming). • Flip side – • Interference between threads. • Hard to detect conditions such as deadlocks and races. • Hard to get fine grained critical sections and course grained critical sections reduces concurrency
Alternatives • Transactional Memory. • Advantages • Higher level programming model. No need to know which locks to acquire. • No need of fine grained delineation of critical sections. • Disadvantages • Livelocks, slower progress. • High Overhead.
Idea • Try to combine the advantages of locks and transactional memory. • How do the authors propose we do that? • Adaptive Locks
What are adaptive locks. • Synchronization mechanism combining locks and transactions. • Programmer can specify critical sections which are executed as either mutex locks or atomically as transactions.
How? atomic (l1) { code } Is equivalent to atomic{ code } when executing in transactional mode or lock (l1); code ; unlock(l1).
How do we decide if it should run as a transaction or as a mutex lock. • Let us throw out some terminology. • Nominal contention. • Actual contention. • Transactional overhead.
Nominal Contention s.insert(10) s.insert(20) Wait Nominal Contention = 1 Acquire lock Cannot acquire lock Thread 1 Thread 2 void public synchronized insert(val) { s[size] = val; size++ }
Actual Contention Atomic{ s.insert(10) } Atomic{ s.insert(20) } Actual Contention = 1 Abort Starts first Tries to execute simultaneously Thread 1 Thread 2 //Thread 1 starts S[0] = 10; // Thread 2 tries at the same time and Aborts.
Transactional Overhead. • How much overhead is incurred when the critical section executes in transactional mode versus mutex mode.
How are these terms helpful • The authors use these concepts to dynamically calculate which mode the critical section should be executed in. • Wait .. Are locks and transactions interchangeable? • No they are not .. But we will discuss how with certain high level correctness criteria this can be handled.
Contributions of this paper. • Efficient and effective implementation of adaptive locks. • Trading some accuracy to make it faster and reduce overhead. • Define conditions under which transaction and mutex locks exhibit equivalent behavior. • Evaluate adaptive locks with micro and macro benchmarks.
Programming with adaptive locks • Adaptive locks introduce syntax for a labeled atomic sections. al_t lock1; atomic (lock1){ // critical section }
Some rules for using adaptive locks • Programmer has the burden to make sure that if all the instances of atomic(lock1) are replaced by mutex blocks (mutex mode) then the program is still correct. • Programmer also has the burden to make sure that if all the critical sections are executed as transactions (transactional mode) then the program still runs correctly.
More rules .. • All critical sections associated with the same lock should execute in the same mode. • Mode of nested adaptive lock should be the same as that of the surrounding lock. • Mode switching can also be done either for correctness (I/O operations = mutex mode) or for performance.
Cost benefit analysis • Remember the terms that we talked about before • Nominal Contention • Actual Contention • Transactional Overhead • The authors use these terms to come up with the decision making logic.
And the winner is a.o >= c • If this inequality holds then mutex mode is preferable. • All these factors are computed separately for all of the locks dynamically.
Implementation and Optimizations • Extension of the C language. • Compiler translates it into 2 object code versions. One for mutex version and one for transactional version. • Adaptive locks replace regular lock acquisition. • The adaptive lock state is packed into a memory word.
What is contained in the state • Number of threads executing in transactional mode = thrdsInStmMode • Whether lock is in mutex mode = mutex mode • Whether mutex lock is held = lockheld • Whether we are currently in the process of changing modes = transition.
int acquire(al_t* lock) { int spins = 0; int useTransact = 0; INC(lock->thdsBlocked); while (1) { intptr_t prev,next; prev = lock->state; if (transition(prev) == 0) { if ((useTransact = transactMode(lock,spins))) { if (lockHeld(prev) == 0) { next = setMutexMode(prev,0); next = setThrdsInStmMode(next,thrdsInStmMode(next)+1); if (CAS(lock->state,prev,next) == prev) break; } else { next = setMutexMode(prev,0); next = setTransition(next,1); CAS(lock->state,prev,next); } } else { if (lockHeld(prev) == 0 && thrdsInStmMode(prev) == 0) { next = setMutexMode(prev,1); next = setLockHeld(next,1); if (CAS(lock->state,prev,next) == prev) break; } else if (mutexMode(prev) == 0) { next = setMutexMode(prev,1); next = setTransition(next,1); CAS(lock->state,prev,next); } } } else { if (mutexMode(prev) == 0) { if (lockHeld(prev) == 0) { useTransact = 1; next = setThrdsInStmMode(prev,thrdsInStmMode(prev)+1); next = setTransition(next,0); if (CAS(lock->state,prev,next) == prev) break; } } else { if (lockHeld(prev) == 0 && thrdsInStmMode(prev) == 0) { useTransact = 0; next = setLockHeld(prev,1); next = setTransition(next,0); if (CAS(lock->state,prev,next) == prev) break; } } if (spin_thrld < ++spins) Yield(); } /* end while(1) */ DEC(lock->thdsBlocked); return useTransact; } Acquire is the main routine
Performance Optimizations • Threads need to update variables that keep count and calculate the various statistics for adaptive reasoning. • Remember a (actual contention). • Instead of updating it all the time, threads do regular writes to it. Then a shared update changes the global value. • Of course this can give rise to write-write races but the authors seem to believe that sporadic inaccuracies in the statistics are not significant. • Also to note, inaccuracies in statistics will not result in wrong program execution but choosing the other mode to execute the critical sections.
Performance Optimizations contd .. • Atomic increment and decrement of variable locks->thdsBlocked is also avoided. • The atomic increment and decrement of this variable is done only if there is real spinning else it is not done. This is contrary to the earlier code which was shown.
Performance Optimizations contd .. int acquire(al_t* lock) { int spins = 0; ... INC(lock->thdsBlocked); while (1) { ... // try to acquire, // break if successful if (spin_thrld < ++spins) Yield(); } DEC(lock->thdsBlocked); ... } int acquire(al_t* lock) { int spins = 0; ... while (1) { ... // try to acquire, // break if successful if (spins == 0) INC(lock->thdsBlocked); if (spin_thrld < ++spins) Yield(); } if (0 < spins) DEC(lock->thdsBlocked); ... }
Performance Optimizations contd .. • o (optimization overhead) depends on shared memory updates. • To keep the estimate of o realistic but inexpensive, • It is calculated at regular intervals. • The number of accesses to memory for that transaction are noted and multiplied with a static estimate of much each transaction would take.
Reality Check .. • But hey is interchanging between locks and transactions legal. Are they equivalent? • Answer: No, they are not equivalent. • To be more specific, it depends on the type of STM system. TL2 which is the STM used by the authors differentiates between locks and transactions when they are used interchangeably.
No more boring bullets. We are not MBA students Thread 2 commits but Does not copy the value to memory Thread 1 commits and It removes the first item. By that time, r1 and r2 Will see stale values. Thread 2 eventually Update the value
So how can we fix this • We can make a simple observation from this which is that there should be a lock for all the shared memory locations. • Every access to these locations should be done with the lock held. • This is the standard lockset well-formedness criteria for multi threaded programs.
Some results • Tested with micro and macro benchmarks • Tested with red black trees (STM), splay trees (mutex locks), fine grained hash tables – adaptive locks were as good as the better concurrency mechanism. • Tested with (Stanford Transactional Applications for Multi-Processing).