Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets Tayfun Elmas1, Shaz Qadeer2, Serdar Tasiran1 1Koç University, İstanbul, Turkey 2Microsoft Research, Redmond, WA FATES/RV’06 August 15-16, Seattle, WA

Our goal • Continuous runtime monitoring of concurrent Java programs • Target: Race conditions • Criteria • Efficiency: Tolerable impact on performance • Precision: Prevent false alarms • The Java Memory Model (JMM) [Manson et.al, POPL’05] • “Two accesses form a data race in an execution of a program if • they conflict, • they are from different threads and • they are not ordered by happens-before (H-B).” • Exact H-B computation  precise race detection

Existing dynamic approaches • Vector-clock algorithms[Mattern, 1989] • Vector clock: For each thread and variable, a vector of logical clocks • Vector has size T = #threads • Vector updated at each synchronization operation • Precise but inefficient in some cases • O(T) computation at each synchronization operation • Other algorithms use cheaper checks for well-protected variables • Thread-local variables, variables protected by single locks • Lockset algorithms[Savage et.al., 1997] • Lockset: A set of locks protecting access to variable d • Lockset update rules specific to a synchronization discipline • Efficient, intuitive, but imprecise • False alarms:Synchronization discipline violatedbut no race occurred • Additional mechanisms to reduce false alarms • State machines for object initialization, escape, thread-locality

Our work • The Goldilocks algorithm • Novel lockset-based method thatprecisely computes H-B • As efficient as other lockset algorithms • As precise as vector-clocks • Uniformly captures all synchronization disciplines • Our locksets contain locks, volatile variables, thread ids • Theorem: When thread t accesses variable d, there is no race iff Lockset of d at that point contains t • Sound: Detects all apparent races that occur in execution • Precise: Race reported  Two accesses not ordered by H-B • No false alarms • No alarms about potential races in similar executions

Outline • The Goldilocks algorithm • Implementation • Evaluation • Conclusions

o2 o1 a a o1 o2 b b Example T1T2T3 class IntBox { int x; } a := IntBox() L1 b := IntBox() acquire(L1) L2 a.x ++ Global Variablesa, b: IntBoxo1.x, o2.x: int release(L1) acquire(L1) acquire(L2) tmp:= a a := b b := tmp release(L1) L1 release(L2) acquire(L2) L2 b.x ++ release(L2)

check LS(o1.x)  LH(T1) =  LS(o1.x) = {all locks} {L1} = {L1} No access to o1.x, LS(o1.x) not modified Racereported! check LS(o1.x)  LH(T3) =  LS(o1.x) = {L1}  {L3} =  Eraser a := IntBox() LS(o1.x) = {all locks} b := IntBox() T1 acquire(L1) a.x ++ release(L1) acquire(L1) acquire(L2) tmp:= a T2 a := b b := tmp release(L1) release(L2) acquire(L2) b.x ++ T3 release(L2)

a := IntBox() b := IntBox() acquire(L1) p release(L1) sw acquire(L1) acquire(L2) tmp:= a p hb a := b b := tmp release(L1) release(L2) sw acquire(L2) p release(L2) The happens-before relation • Happens-before in JMM: hb • Transitive closure of • Program orders of threads: p • Synchronizes-with: sw • release(l) sw acquire(l) • vol-write(v) sw vol-read(v) • fork(t) hb(action of t) • (action of t) hb join(t) T1 a.x ++ T2 b.x ++ T3

Goldilocks intuition • LS: (Variables)  (Threads Locks  Volatiles) • Update rules maintain invariants: • Thread t  LS(d)   t is owner of d • Accesses to d by t are race-free • Lock l  LS(d)  acquire l to become owner of d • Volatile v LS(d)  read v to become owner of d • When t accesses d: Race-free iff (t  LS(d)) • After t accesses d: LS(d) = { t } • t is the only owner of d • Other threads: Must synchronize with t • In order to become an owner of d

Lockset update rules • Ownership transfer between threads • LS(d) grows through synchronization actions • release(l) by t For each variable d: if (t  LS(d))  (add l to LS(d)) • acquire(l) by t For each variable d: if (l  LS(d))  (add t to LS(d)) • volatile-write(v) by t For each variable d: if (t  LS(d))  (add v to LS(d)) • volatile-read(v) by t For each variable d: if (v  LS(d))  (add t to LS(d)) • fork(s) by t For each variable d: if (t  LS(d))  (add s to LS(d)) • join(s) by t For each variable d: if (s  LS(d))  (add t to LS(d))

First access LS(o1.x) = {T1} (T1  LS)  (add L1 to LS) LS(o1.x) = {T1, L1} (L2  LS)  (add T2 to LS) (L1  LS)  (add T2 to LS) LS(o1.x) = {T1, L1, T2} LS(o1.x) = {T1, L1, T2} (T2  LS)  (add L2 to LS) (T2  LS)  (add L1 to LS) LS(o1.x) = {T1, L1, T2, L2} LS(o1.x) = {T1, L1, T2} (L2  LS)  (add T3 to LS) LS(o1.x) = {T1, L1, T2, L2, T3} (T3  LS)  (No race) LS(o1.x) = {T3} (T3  LS)  (add L2 to LS) LS(o1.x) = {T3, L2} Goldilocks a := IntBox() LS(o1.x) =  b := IntBox() T1 acquire(L1) a.x ++ release(L1) acquire(L1) acquire(L2) tmp:= a T2 a := b b := tmp release(L1) release(L2) acquire(L2) b.x ++ T3 release(L2)

Uniform handling of many scenarios • Dynamically changing locksets • Permanent/temporary thread-locality • Container-protected objects • Lockset of contained variable changesalthough variable is not touched • Synchronization using wait/notify(All) • No additional lockset update rules • Synchronization using volatile variables • Conditional branches on volatile variables • Classes in java.util.concurrent package • Semaphores, barriers, ...

T1, acquire, l Global event list T2, vol-write, v T1, release, l T2, acquire, l x T1, vol-read, v T2, release, l y Implementation • Naive implementation too inefficient acquire(l) by thread t For each variable d: if (l  LS(d))  (add t to LS(d)) Implementation features • Short-circuit checks before lockset computation • Handle thread-locality, unique protecting lock,... • Lazy evaluation of locksets • Apply update rules at only variable access • Keep synchronization actions in a global event list • Order of events consistent with p and sw • Implicit, shared representation of locksets • Use temporary locksets only at access

T1, acquire, l Global event list T2, vol-write, v T1, release, l T2, acquire, l T1, vol-read, v T2, release, l Implementation in Kaffe • In the Kaffe Virtual Machine[http://www.kaffe.org] • Clean room implementation of JVM in C • Full Java platform functionality • Instrumented byte-code interpreter • Functions executing instructions for synchronization, heap access • Per thread checking • Each thread checks its own actions • Communication via global event list • Applicable to multiprocessors Handle-Action (Thread t, Action ) IF  is a synchronization action Add  to the global event list ELSE IF  is an access to variable d IF all short-circuit checks fail Apply-Lockset-Rules(t, d)

Short-circuit checks • Sufficient, constant time checks for H-B • If any of them succeed: No race  No need for lockset computation • Track owner thread • For each variable d, keep the last accessor thread • owner-thread(d): Current accessor thread • Succeeds when d remains thread-local • Track single unique lock • For each variable d, guess a unique protecting lock • single-lock(d): Random lock held by current accessor thread • Succeeds as long as d is accessed while holding same lock

a := IntBox() T1, alloc, o1 T1, alloc, o1 T1, alloc, o1 b := IntBox() a := IntBox() a := IntBox() T1 acquire(L1) b := IntBox() b := IntBox() a.x ++ T1 acquire(L1) T1 acquire(L1) T1, alloc, o2 T1, alloc, o2 T1, alloc, o2 release(L1) a.x ++ a.x ++ T1, acquire, L1 o1.x T1, acquire, L1 T1, acquire, L1 Initialize LS(o1.x) = { T1 } Repeat Apply lockset rules on LS(o1.x) Until last synchronization action by T3 Check whether T3  LS(o1.x) acquire(L1) release(L1) T1, release, L1 T1, release, L1 Garbage collect unreferenced events acquire(L2) acquire(L1) T2, acquire, L1 T2, acquire, L1 tmp:= a acquire(L2) T2, acquire, L2 T2, acquire, L2 T2 a := b tmp:= a T2, release, L1 T2, release, L1 b := tmp T2 a := b T2, release, L2 T2, release, L2 release(L1) b := tmp T3, acquire, L2 T3, acquire, L2 release(L2) release(L1) T3, release, L2 acquire(L2) release(L2) T3 b.x ++ acquire(L2) T3 release(l) b.x ++ Lazy evaluation of locksets

Evaluation • Algorithms evaluated • Goldilocks • Eraser with state machines • Vector-clocks Benchmarks • Microbenchmarks: Interesting, artificial programs • Multiset: Well-protected insertions, deletions, lookups of integers • SharedSpot: Contains variables each protected by a unique lock • LocalSpot: Contains thread-local variables • Larger programs for performance comparison • Raja, SciMark, Grande

Microbenchmarks Interesting cases: Thread-locality, variables protected by single unique locks Short-circuit checks work Per-access cost increasesvery slowly with # of threads

Large benchmarks • Goldilocks much faster than vector clocks • Performance comparable to Eraser • Precision comes at little or no extra cost

Conclusions • The Goldilocks algorithm: A precise lockset-based characterization of the happens-before relation • Sound: Detects all apparent races • Precise: No false alarms • Efficient: Short-circuit checks + Lazy evaluation • Handles all synchronization disciplines uniformly • Thread-locality, dynamically changing locksets,volatile variable-based synchronization, ... • Applicable to both model checking & runtime monitoring • Future work • Dynamic & static methods based on Goldilocks • Tolerable cost for continuous runtime monitoring • Tight integration of static methods and Goldilocks

Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets