Microprocessors

Microprocessors Synchronization Instructions April 16th, 2002

Consider Simple Monitor Case • Logically we want • Test if lock is set, if so wait, else set lock • Do processing needing to lock out others • Release the lock • Example, two threads need to access the screen, but there is only one cursor register, so must not intefere with one another.

The Critical Lock Step • Test if lock is set, if so wait, else set lock • The point is that this must be atomic • The following code is wrong. Why? • Move mem to register • Test if register is non-zero • If so, wait • Else set mem non-zero

The Critical Lock Step • Test if lock is set, if so wait, else set lock • The point is that this must be atomic • The following code is wrong. Why? • Move mem to register • Test if register is non-zero • -> BECAUSE YOU CAN GET INTERRUPT HERE • If so, wait • Else set mem non-zero

Let the OS Do Things • Set address of lock in a register • Trap to operating system • Let the operating system take care of it • Works fine … • But … • Operating system traps are expensive • Most of the time the lock was free • So this is very inefficient

Hardware Instructions • Test and Set • Sets memory non-zero • And sets a flag to indicate if it was non-zerobefore the operation was done. • This instruction must be indivisible • Requires indivisible memory read/write • This is the classical instruction • E.g. the TS instruction on the IBM mainframe

Other Hardware Instructions • Exchange memory register • We can simulate TS with this • Register has non-zero value • Memory is lock • Do exchange • Test what was in memory before

Other Hardware Instructions • Replace-Add • register  mem  mem + register • Much more powerful. Consider the following problem. • A queue of tasks is to be executed • Each thread wants to grab one operation • Without having to do fancy synchronization

Solution to that Problem • Represent the tasks in an array, one task per array entry. Tasks are indexed by subscripts in the range N .. M. • A mem variable has address of next task to take. • Each thread does Replace-Add of 1 on this mem variable. • Each thread gets a unique index

Caching Considerations • The memory locations cannot be cached • Because two different threads on two different processes could get confused • If each has a cached copy of the memory • So either the caches must be coherent • Or the memory location must not be cached at all.

Microprocessors