Inherent limitations facilitate design & verification of concurrent programs

Inherent limitations facilitate design & verificationof concurrent programs Hagit AttiyaTechnion

Concurrent Programs • Core challenge is synchronization • Correct synchronization is hard to get right • Efficient synchronization is even harder Principled, Automatic approach

Example I:Verifying Locking protocols Work with Ramalingam and Rinetzky (POPL 2010)

The Goal: Sequential Reductions Verify concurrent data structures • Pre-execution static analysis E.g., linked list with hand-over-hand locking • no memory leaks, shape (it’s a list), serializability Find sequential reductions • Consider only sequentialexecutions • But conclude that properties hold in allexecutions

Back-of-envelopestimate of gain Static analysis of a linked-list algorithm [Amit, Rinetzky, Reps, Sagiv, Yahav, CAV 2007] • Verifies e.g., memory safety, sortedness, pointed-to by a variable, heap sharing

~ ~ ~ ~ ~ ~ ~ ~ ~ Serializability [Papadimitriou ‘79] interleaved execution operation Observed by the threads locally complete non-interleaved execution

Serializability gives Sequential Reduction Concurrent code M Asmall subset of all executions If M is serializable, then a local property φ holds in all executions of M iffφ holds in all complete non-interleaved executions Easily derived from [Papadimitriou ‘79]

How do we know that M is serializable, without considering all executions?

Special (and common) case: Disciplined programming with locks Guard access to data with locks (lock & unlock) Only one process holds the lock at each time Follow a locking protocol that guarantees conflictserializability E.g., two-phase locking (2PL) or tree locking (TL)

H Two-phase locking [Papadimitriou `79] • Locks acquire (grow) phase followed by locks release (shrink) phase • No lock is acquired after somelock is released t2 t1 t1 t1 t1

H Tree (hand-over-hand) locking [Kedem & Sliberschatz ‘76] [Smadi ‘76] [Bayer & Scholnick ‘77] • Except for the first lock, acquire a lock only when holding the lock on its parent • No lock is acquired after being released t2 t1 t1 t1

H Tree (hand-over-hand) locking [Kedem & Sliberschatz ‘76] [Smadi ‘76] [Bayer & Scholnick ‘77] • Except for the first lock, acquire a lock only when holding the lock on its parent • No lock is acquired after being released t2 t2 t1 t1

void p() { acquire(B) B = 0 release(B) int b = B if (b) acquire(A) } void q() { acquire(B) B = 1 release(B) } Not two-phase locked But only in interleaved executions Yes! • for databases • concurrency control monitor ensures that M follows the locking policy at run-time  M is serializable No! • for code analysis • no central monitor

Our Goal Statically verify that M followsa locking policy For localconflict-serializable locking protocols • Depending only on thread’s local variables & global variables locked by it E.g., two phase locking, tree locking, (dynamic) DAG locking… But not protocols that rely on a centralized concurrency control monitor!

Our contribution: Easy step complete non-interleaved executions of M Two phase locking Tree locking Dynamic tree locking A localconflict serializable locking policy is respected in all executions iff it is respected in all non-interleaved executions A thread-local property holds in all executions iff it holds in all non-interleaved executions

Our contribution: Easy step complete non-interleaved executions of M Proof considers shortest execution violating the protocol + indistiguishability argument A localconflict serializable locking policy is respected in all executions iff it is respected in all non-interleaved executions

Further reduction Almost-complete non-interleaved executions of M A local conflict serializable locking policy is respected in all executions iff it is respected in all almost-complete non-interleaved executions

Further reduction: A complication Need to argue about termination Observe Y == 1 & violates 2PL Y is set to 1 & the method enters an infinite loop int X=0, Y=0 void p() { acquire(Y) y = Y release(Y); if (y ≠ 0) acquire(X) X = 3 release(X) } void q() { if (random(5) == 3){ acquire(Y) Y = 1 release(Y) while (true) nop } } Cannot happen in complete non-interleaved executions

Further reduction: Termination  Can use sequential reduction to verify termination A terminating local conflict serializable locking policy is respected in all executions iffit is respected in all almost-complete non-interleaved executions

Initial analysis results Shape analysis of hand-over-hand linked lists * Does not verify sortedness of list and fails to verify linearizabilityin some cases Shape analysis of hand-over-hand trees (for the first time)

What’s next? • Extend to other serializability protocols • shared (read) locks • non-locking non-conflict based serializability (e.g., using timestamps) • optimistic protocols • Aborted / failed methods

Work with Guerraoui, Hendler, Kuznetsov, Michael and Vechev (POPL 2011) Example II:Required Memory orderings

Relaxed memory models Out of order execution of memory accesses, to compensate for slow writes Optimize to issue reads before following writes, if they access different locations Reordering may lead to inconsistency CPU 0 CPU 1 cache cache interconnect memory

Read-after-write (RAW) Reordering Process P: Write(X,1) Read(Y) • Process Q: • Write(Y,1) • Read(X) W(X,1) R(Y) W(X,1) P Q W(Y,1) R(X)

Avoiding out-of-order:Read-after-write (RAW) Fence Process P: Write(X,1) FENCE Read(Y) • Process Q: • Write(Y,1) • FENCE • Read(X) W(X,1) R(Y) P Q W(Y,1) R(X)

Avoiding out-of-order:Atomic Operations Atomic operations: atomic-write-after-read (AWAR) E.g., CAS, TAS, Fetch&Add,… • atomic{ • read(Y) • … • write(X,1) • } RAW fences / AWAR are ~60 slower than (remote) memory accesses

Our result • Concurrent data types: • queues, counters, hash tables, trees,… • Non-commutative operations • Serializable solo-terminating implementations • Mutual exclusion Any concurrent program in a certain class must use RAW / AWARs

Non-commutative operations Operation A is non-commutative if there is operation B where: A influences B and B influences A

Example: Queue enq(v) adds v to the end of the queue deq() takes item from the head of the queue Q.deq():1;Q.deq():2 Q.deq():2;Q.deq():1 deq() influence each other Q.enq(3):ok;Q.deq():1 Q.deq():1;Q.enq(3):ok enq() is not non-commutative Q 1 2 Q 1 2 3 Q 1 2 3

Proof Intuition: Writing If an operation does not write, it does not influence anyoneIt would be commutative deq • 1 deq • 1 no shared write deq do not influence each other

Proof Intuition: Reading If an operation does not read, it is not influenced by anyoneIt would be commutative deq • 1 deq • 1 no shared read deq do not influence each other

Proof Intuition: RAW • deq • 1 • 1 • deq W no RAW • deq • 1 • deq • 1 serialization

Mutual exclusion (Mutex) Two processes do not hold lock at the same time (Deadlock-freedom) If a process calls Lock() then some process acquires the lock Lock() operations do not “commute”! Every successful Lock() incurs a RAW / AWAR

Who should care? • Concurrent programmers: know when is it futile to try and avoid expensive synchronization • Hardware designers: motivation to lower cost of specific synchronization constructs • API designers: choice of API affects synchronization • Verification engineers: declare incorrect when synchronization is missing “…although I hope that these shortcomings will be addressed, I hasten to add that they are insignificant compared to the huge step forward that this paper represents….” -- Paul McKenney, Linux Weekly News, Jan 26, 2011

What else? • Weaker operations? E.g., idempotent Work Stealing • Other patterns • Read-after-read, write-after-write, barriers, across-thread orders • The cost of verifying adherence to a locking policy • (Semi-) Automatic insertion of lock acquire / release commands or fences

And beyond… Other theorems allowing to “cut corners” when designing / verifying concurrent applications

Thank you!

Inherent limitations facilitate design & verification of concurrent programs