Reactive Spin-locks: A Self-tuning Approach

Reactive Spin-locks: A Self-tuning Approach Phuong Hoai Ha Marina Papatriantafilou Philippas Tsigas I-SPAN ’05, Las Vegas, Dec. 7th – 9th, 2005

Outline • Mutual exclusion • Overhead • Available reactive spin-locks • New reactive spin-lock • Model • Algorithm • Evaluation • Conclusions I-SPAN '05

Mutual exclusion Noncritical sec. Entry section Critical section Exit section • Performance goals: • Low latency • Low contention • … Requests issued Lock released Arbitration Lock sent to winner I-SPAN '05

Spin-lock categories • Arbitrating locks: • Determine who is the next lock-holder in advance, e.g. ticket-locks, queue-locks. • Advantages: • Prevent processors from causing bursts in network traffic and high contention on the lock. • Non-arbitrating locks: • E.g. Test-and-set locks • Advantages: • Exploit locality/cache • Tolerate failures in the Entry section. I-SPAN '05

Arbitrating vs. non-arbitrating locks 1 3 5 Interconnection Network Interconnection Network 2 4 6 I-SPAN '05

Available reactive spin-lock algorithms • Drawbacks: • Their reactive schemes rely on • Fixed experimental thresholds • The thresholds frequently become inappropriate in variable and unpredictable environments like multiprogramming systems • E.g. ticket locks with proportional backoff, test-and-test-and-set locks with exponential backoff • Known probability distributions of some inputs • The assumption is not usually feasible. I-SPAN '05

New reactive spin-lock algorithm • Ideas • A non-arbitrating lock with adaptive sensible backoff delay. • Advantages • Its reactive scheme is self-tuning • Neither experimentally tuned thresholds nor probability distributions of inputs are needed • It combines advantages of both arbitrating and non-arbitrating spin-lock categories. • It can exploit locality as well as reduce contention on the lock. I-SPAN '05

Find sensible backoff delay • Need to optimize trade-off between: • Latency • The interval between a pair of lock-release and lock-acquisition • Contention on the lock • This is an online problem. delay=? Load on the lock I-SPAN '05

Increase delay only when the load on lock is the highest so far, • When increasing delay, increase just enough to keep the competitive ratio c = P - (P-1)/P1/(P-1) Reactive scheme • Bounds for loads on the lock: 1  lt  P • During a load-rising phase: • Similar for load-dropping phase • In each load-rising/load-dropping phase, the reactive scheme is competitive with competitive ration c=(ln(P)) I-SPAN '05

Algorithm • The algorithm guarantees mutual exclusion and non-livelock. Its space complexity is log(P). 0 1 3 4 2 0 1 Interconnection Network 3 2 I-SPAN '05

Evaluation • Benchmarks • Spark98 kernel: lmv • SPLASH-2 suite: Volrend and Radiosity • Representatives: • Arbitrating: ticket lock with (tuned) proportional backoff • Non-arbitrating: test-and-test-and-set lock with (tuned) exponential backoff • System • A ccNUMA SGI Origin2000 with 28 250MHz MIPS R1000 processors. I-SPAN '05

Experimental results I-SPAN '05

Experimental results (2) I-SPAN '05

Experimetal results (3) I-SPAN '05

Conclusions • We have designed and implemented a new reactive spin-lock: • It is self-tuning. • It combines advantages of both arbitrating and non-arbitrating locks • Its reactive scheme is competitive with c= (ln(P))  The lock automatically adjusts its backoff delay reasonably according to loads on the lock as well as applications I-SPAN '05

Thanks for your attention!

Estimate delay bases • Fairness • A fair lock helps parallel application gain performance since the application threads can execute their non-critical section in parallel. • Definition: • Heuristic to estimate basel , where ni is #lock-acquisitions of a processor in t and N is #processors , where a, b are system documented constants and DoCS is the delay outside CS I-SPAN '05

NUMA • Another parameter that makes the problem harder is NUMA • Latency is much different • E.g. ccNUMA SGI Origin2000 I-SPAN '05

Model: An online problem • A sequence of loads on the lock are unfolded on-the-fly. • When observing a load, the algorithm must decide how much its current backoff delay should be lengthened. • If increasing delay too soon, it will waste time on a long delay when the lock becomes available • If not increasing delay in time, it will cause high contention on the lock  it must increase delay at high loads reasonably  Goal is to maximize t delayt .loadt ,wheret delayt  P I-SPAN '05

LockType: <lock, counter> Initial delay = L.counter x basel The algorithm guarantees mutual exclusion and non-livelock. Its space complexity is log(P). Acquire( Lock pL) L = FAA(pL.L, <1,1>) if L.lock then delay = ComputeDelay(L) cond = <1,0> do sleep(delay) L = pL.L if L.lock then delay = ComputeDelay(L) continue; cond = FAA(pL.L, <1,0>) while cond.lock Release( Lock pL) do L = pL.L while not CAS(pL.L,L,<0,L.counter-1>) Algorithm I-SPAN '05

Reactive Spin-locks: A Self-tuning Approach

Reactive Spin-locks: A Self-tuning Approach

Presentation Transcript

Reactive Arthritis

Solaris/Linux Performance Measurement and Tuning

Advanced Distillation Column Modelling and Reactive Distillation

13.7 Spin-Spin Splitting in NMR Spectroscopy

Tuning Fork Tests

Reactive and Explosive Materials

Reactive Power, Voltage Control and Voltage Stability Aspects of Wind Integration to the Grid

ASE106: Tuning ASE for PeopleSoft Applications

Reactive Powder Concrete

Catalyst Characterization

Performance Tuning Workshop - Architecture

Automatically Tuning Collective Communication for One-Sided Programming Models

Approach to Dyspnea

Spin Locks and Contention

Quantum Spin Liquid Patrick Lee MIT

Promela/SPIN

Automatic Performance Tuning of Numerical Kernels BeBOP: Berkeley Benchmarking and OPtimization

Performance Tuning Tips

Controller Design, Tuning

40183 : Oracle XML DB Performance and Tuning