140 likes | 157 Views
Learn about the necessity of process coordination using lock variables and the importance of architecture-supported arbitration mechanisms to manage shared resources efficiently. This involves understanding atomic Compare and Swap operations and the implementation of lock algorithms to prevent race conditions. Explore the concepts of spin locks, sleep locks, memory fences, and locking variants for effective synchronization strategies.
E N D
Implementation of Locks David Gregg Trinity College Dublin
Process Synchronization • Need to be able to coordinate processes working on a common task • Lock variables (mutexes) are used to coordinate or synchronize processes • Need an architecture-supported arbitration mechanism to decide which processor gets access to the lock variable • Single bus provides arbitration mechanism, since the bus is the only path to memory – the processor that gets the bus wins • Need an architecture-supported operation that locks the variable • Locking can be done via an atomic compare and swap operation (processor can both read a location and set it to the locked state – test-and-set – in the same bus operation)
Atomic Compare and Swap • Machine instruction used to synchronize threads • Threads may be running on different cores • Compare and swap acts like a simple C function bool compare_and_swap(int * address, int testval, int newval) { int old_val = *address; if (oldval == testval) { *address = newval; } return old_val; }
Atomic Compare and Swap • Compare and swap (CAS) operation completes atomically • There is no possibility of the memory location being modified between the CAS reading the location and writing to it • Can be used to implement locks • E.g, 0 means unlocked, 1 means locked: value = compare_and_swap(lock, 0, 1); if ( value == 0 ) /* we have the lock */ else /* we failed to acquire the lock */
Atomic Compare and Swap • Lock algorithms can be built using compare and swap operations • Most locks use compare and swap to attempt to acquire the lock • Some specialist lock algorithms use other types of instructions • Major difference between different locking algorithms is what to do if you fail to acquire the lock • Two main choices • Go into a loop and repeatedly try to acquire the lock • Known as a spin lock • Keeps spinning until another thread releases the lock • Put the current thread to sleep and add it to a queue of threads waiting for the lock to be released
Atomic Compare and Swap • Spin lock algorithms can be built using compare and swap operations void acquire_lock(int * lock) { do { bool old_val = compare_and_swap(lock, 0, 1); } while (old_val == 1); } void release_lock(int * lock) { fence(); *lock = 0; //maybe we need a memory fence on } // some architectures
Spin Spin Lock Synchronization Read lock variable unlock variable: set lock variable to 0 No Unlocked? (=0?) Yes Finish update of shared data atomic operation Try to lock variable using swap: read lock variable and set it to locked value (1) . . . No Yes Begin update of shared data Succeed? (=0?) The single winning processor will read a 0 - all others processors will read the 1 set by the winning processor
Memory fences • Modern superscalar processors typically execute instructions out of order • In a different order to the original program order • Loads and stores can be executed out of order • Processor may speculate that the load takes data from a different address to the store • Processor may actually have load and store addresses, and thus be able to tell which conflict and which are independent
Memory fences • Each core typically has a load/store queue • Stores are written to the queue, and will be flushed to memory “eventually” • A load may take data directly from the load/store queue if there has been a recent store to the location • So a load instruction may not ever reach memory • If there is already a store to a memory location in the load/store queue, and additional store may overwrite the existing store
Atomic Compare and Swap • Spin locks are usually faster than putting a waiting thread to sleep • Acquire the lock as soon as it released • No need to interact with operating system • Works well when lock is seldom contended • But wasteful of resources and energy when the lock is contended • Thread busy spins doing busy work • Large number of memory operations on bus • Disaster scenario is spinning while waiting for a thread that is not running
Atomic Compare and Swap • Sleeping locks usually slower • Need operating system call to put thread to sleep • Need to manage queue of waiting threads • Every time the lock is released, you must check the list of waiters • Sleeping locks make better use of total resources • Core is released while thread waits • Works well when lock is often contended and there are more threads than cores • Especially important in virtualized systems • Many lock algorithms are hybrids • Spin for a little while to see if lock is release quickly • Then go to sleep if it is not • Used by Linux futex (fast mutex) library
Locking variants • There are many variants of these basic locks • Spin locks with “back off” • If the lock is contended the thread waits a while before trying again • Avoids the waiting threads constantly trying to access the cache line containing the lock • Can help where the lock is contended by several threads • Queueing spin locks • Makes the queueing fairer
Atomic Increment • Alternative to atomic compare and swap • Atomically increments a value in memory • Can be used for multithreaded counting without locks • Atomic increment locks • Can build locks from atomic increment • Known as ticket lock • Based on ticket system in passport office, GNIB or Argos • Queueing spin lock • Multiple threads can be waiting for a lock • With a regular spin lock, all waiting threads compete for the lock when it becomes free • With a queuing spin lock, the waiting threads for an orderly FIFO queue • Guarantees fairness
Atomic Increment struct ticket_lock { int dispense; int current_ticket; }; void acquire_lock(struct ticket_lock * lock) { int my_ticket = atomic_increment(&(lock->dispense)); while ( my_ticket != lock->current_ticket); } void unlock(struct ticket_lock * lock) { lock->current_ticket++; }