140 likes | 154 Views
Implementation of Locks. David Gregg Trinity College Dublin. Process Synchronization. Need to be able to coordinate processes working on a common task Lock variables (mutexes) are used to coordinate or synchronize processes
E N D
Implementation of Locks David Gregg Trinity College Dublin
Process Synchronization • Need to be able to coordinate processes working on a common task • Lock variables (mutexes) are used to coordinate or synchronize processes • Need an architecture-supported arbitration mechanism to decide which processor gets access to the lock variable • Single bus provides arbitration mechanism, since the bus is the only path to memory – the processor that gets the bus wins • Need an architecture-supported operation that locks the variable • Locking can be done via an atomic compare and swap operation (processor can both read a location and set it to the locked state – test-and-set – in the same bus operation)
Atomic Compare and Swap • Machine instruction used to synchronize threads • Threads may be running on different cores • Compare and swap acts like a simple C function bool compare_and_swap(int * address, int testval, int newval) { int old_val = *address; if (oldval == testval) { *address = newval; } return old_val; }
Atomic Compare and Swap • Compare and swap (CAS) operation completes atomically • There is no possibility of the memory location being modified between the CAS reading the location and writing to it • Can be used to implement locks • E.g, 0 means unlocked, 1 means locked: value = compare_and_swap(lock, 0, 1); if ( value == 0 ) /* we have the lock */ else /* we failed to acquire the lock */
Atomic Compare and Swap • Lock algorithms can be built using compare and swap operations • Most locks use compare and swap to attempt to acquire the lock • Some specialist lock algorithms use other types of instructions • Major difference between different locking algorithms is what to do if you fail to acquire the lock • Two main choices • Go into a loop and repeatedly try to acquire the lock • Known as a spin lock • Keeps spinning until another thread releases the lock • Put the current thread to sleep and add it to a queue of threads waiting for the lock to be released
Atomic Compare and Swap • Spin lock algorithms can be built using compare and swap operations void acquire_lock(int * lock) { do { bool old_val = compare_and_swap(lock, 0, 1); } while (old_val == 1); } void release_lock(int * lock) { fence(); *lock = 0; //maybe we need a memory fence on } // some architectures
Spin Spin Lock Synchronization Read lock variable unlock variable: set lock variable to 0 No Unlocked? (=0?) Yes Finish update of shared data atomic operation Try to lock variable using swap: read lock variable and set it to locked value (1) . . . No Yes Begin update of shared data Succeed? (=0?) The single winning processor will read a 0 - all others processors will read the 1 set by the winning processor
Memory fences • Modern superscalar processors typically execute instructions out of order • In a different order to the original program order • Loads and stores can be executed out of order • Processor may speculate that the load takes data from a different address to the store • Processor may actually have load and store addresses, and thus be able to tell which conflict and which are independent
Memory fences • Each core typically has a load/store queue • Stores are written to the queue, and will be flushed to memory “eventually” • A load may take data directly from the load/store queue if there has been a recent store to the location • So a load instruction may not ever reach memory • If there is already a store to a memory location in the load/store queue, and additional store may overwrite the existing store
Atomic Compare and Swap • Spin locks are usually faster than putting a waiting thread to sleep • Acquire the lock as soon as it released • No need to interact with operating system • Works well when lock is seldom contended • But wasteful of resources and energy when the lock is contended • Thread busy spins doing busy work • Large number of memory operations on bus • Disaster scenario is spinning while waiting for a thread that is not running
Atomic Compare and Swap • Sleeping locks usually slower • Need operating system call to put thread to sleep • Need to manage queue of waiting threads • Every time the lock is released, you must check the list of waiters • Sleeping locks make better use of total resources • Core is released while thread waits • Works well when lock is often contended and there are more threads than cores • Especially important in virtualized systems • Many lock algorithms are hybrids • Spin for a little while to see if lock is release quickly • Then go to sleep if it is not • Used by Linux futex (fast mutex) library
Locking variants • There are many variants of these basic locks • Spin locks with “back off” • If the lock is contended the thread waits a while before trying again • Avoids the waiting threads constantly trying to access the cache line containing the lock • Can help where the lock is contended by several threads • Queueing spin locks • Makes the queueing fairer
Atomic Increment • Alternative to atomic compare and swap • Atomically increments a value in memory • Can be used for multithreaded counting without locks • Atomic increment locks • Can build locks from atomic increment • Known as ticket lock • Based on ticket system in passport office, GNIB or Argos • Queueing spin lock • Multiple threads can be waiting for a lock • With a regular spin lock, all waiting threads compete for the lock when it becomes free • With a queuing spin lock, the waiting threads for an orderly FIFO queue • Guarantees fairness
Atomic Increment struct ticket_lock { int dispense; int current_ticket; }; void acquire_lock(struct ticket_lock * lock) { int my_ticket = atomic_increment(&(lock->dispense)); while ( my_ticket != lock->current_ticket); } void unlock(struct ticket_lock * lock) { lock->current_ticket++; }