1 / 22

Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets. Tayfun Elmas 1 , Shaz Qadeer 2 , Serdar Tasiran 1 1 Koç University, İstanbul, Turkey 2 Microsoft Research, Redmond, WA. FATES/RV’06 August 15-16, Seattle, WA. Our goal.

clove
Download Presentation

Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets Tayfun Elmas1, Shaz Qadeer2, Serdar Tasiran1 1Koç University, İstanbul, Turkey 2Microsoft Research, Redmond, WA FATES/RV’06 August 15-16, Seattle, WA

  2. Our goal • Continuous runtime monitoring of concurrent Java programs • Target: Race conditions • Criteria • Efficiency: Tolerable impact on performance • Precision: Prevent false alarms • The Java Memory Model (JMM) [Manson et.al, POPL’05] • “Two accesses form a data race in an execution of a program if • they conflict, • they are from different threads and • they are not ordered by happens-before (H-B).” • Exact H-B computation  precise race detection

  3. Existing dynamic approaches • Vector-clock algorithms[Mattern, 1989] • Vector clock: For each thread and variable, a vector of logical clocks • Vector has size T = #threads • Vector updated at each synchronization operation • Precise but inefficient in some cases • O(T) computation at each synchronization operation • Other algorithms use cheaper checks for well-protected variables • Thread-local variables, variables protected by single locks • Lockset algorithms[Savage et.al., 1997] • Lockset: A set of locks protecting access to variable d • Lockset update rules specific to a synchronization discipline • Efficient, intuitive, but imprecise • False alarms:Synchronization discipline violatedbut no race occurred • Additional mechanisms to reduce false alarms • State machines for object initialization, escape, thread-locality

  4. Our work • The Goldilocks algorithm • Novel lockset-based method thatprecisely computes H-B • As efficient as other lockset algorithms • As precise as vector-clocks • Uniformly captures all synchronization disciplines • Our locksets contain locks, volatile variables, thread ids • Theorem: When thread t accesses variable d, there is no race iff Lockset of d at that point contains t • Sound: Detects all apparent races that occur in execution • Precise: Race reported  Two accesses not ordered by H-B • No false alarms • No alarms about potential races in similar executions

  5. Outline • The Goldilocks algorithm • Implementation • Evaluation • Conclusions

  6. o2 o1 a a o1 o2 b b Example T1T2T3 class IntBox { int x; } a := IntBox() L1 b := IntBox() acquire(L1) L2 a.x ++ Global Variablesa, b: IntBoxo1.x, o2.x: int release(L1) acquire(L1) acquire(L2) tmp:= a a := b b := tmp release(L1) L1 release(L2) acquire(L2) L2 b.x ++ release(L2)

  7. check LS(o1.x)  LH(T1) =  LS(o1.x) = {all locks} {L1} = {L1} No access to o1.x, LS(o1.x) not modified Racereported! check LS(o1.x)  LH(T3) =  LS(o1.x) = {L1}  {L3} =  Eraser a := IntBox() LS(o1.x) = {all locks} b := IntBox() T1 acquire(L1) a.x ++ release(L1) acquire(L1) acquire(L2) tmp:= a T2 a := b b := tmp release(L1) release(L2) acquire(L2) b.x ++ T3 release(L2)

  8. a := IntBox() b := IntBox() acquire(L1) p release(L1) sw acquire(L1) acquire(L2) tmp:= a p hb a := b b := tmp release(L1) release(L2) sw acquire(L2) p release(L2) The happens-before relation • Happens-before in JMM: hb • Transitive closure of • Program orders of threads: p • Synchronizes-with: sw • release(l) sw acquire(l) • vol-write(v) sw vol-read(v) • fork(t) hb(action of t) • (action of t) hb join(t) T1 a.x ++ T2 b.x ++ T3

  9. Goldilocks intuition • LS: (Variables)  (Threads Locks  Volatiles) • Update rules maintain invariants: • Thread t  LS(d)   t is owner of d • Accesses to d by t are race-free • Lock l  LS(d)  acquire l to become owner of d • Volatile v LS(d)  read v to become owner of d • When t accesses d: Race-free iff (t  LS(d)) • After t accesses d: LS(d) = { t } • t is the only owner of d • Other threads: Must synchronize with t • In order to become an owner of d

  10. Lockset update rules • Ownership transfer between threads • LS(d) grows through synchronization actions • release(l) by t For each variable d: if (t  LS(d))  (add l to LS(d)) • acquire(l) by t For each variable d: if (l  LS(d))  (add t to LS(d)) • volatile-write(v) by t For each variable d: if (t  LS(d))  (add v to LS(d)) • volatile-read(v) by t For each variable d: if (v  LS(d))  (add t to LS(d)) • fork(s) by t For each variable d: if (t  LS(d))  (add s to LS(d)) • join(s) by t For each variable d: if (s  LS(d))  (add t to LS(d))

  11. First access LS(o1.x) = {T1} (T1  LS)  (add L1 to LS) LS(o1.x) = {T1, L1} (L2  LS)  (add T2 to LS) (L1  LS)  (add T2 to LS) LS(o1.x) = {T1, L1, T2} LS(o1.x) = {T1, L1, T2} (T2  LS)  (add L2 to LS) (T2  LS)  (add L1 to LS) LS(o1.x) = {T1, L1, T2, L2} LS(o1.x) = {T1, L1, T2} (L2  LS)  (add T3 to LS) LS(o1.x) = {T1, L1, T2, L2, T3} (T3  LS)  (No race) LS(o1.x) = {T3} (T3  LS)  (add L2 to LS) LS(o1.x) = {T3, L2} Goldilocks a := IntBox() LS(o1.x) =  b := IntBox() T1 acquire(L1) a.x ++ release(L1) acquire(L1) acquire(L2) tmp:= a T2 a := b b := tmp release(L1) release(L2) acquire(L2) b.x ++ T3 release(L2)

  12. Uniform handling of many scenarios • Dynamically changing locksets • Permanent/temporary thread-locality • Container-protected objects • Lockset of contained variable changesalthough variable is not touched • Synchronization using wait/notify(All) • No additional lockset update rules • Synchronization using volatile variables • Conditional branches on volatile variables • Classes in java.util.concurrent package • Semaphores, barriers, ...

  13. Outline • The Goldilocks algorithm • Implementation • Evaluation • Conclusions

  14. T1, acquire, l Global event list T2, vol-write, v T1, release, l T2, acquire, l x T1, vol-read, v T2, release, l y Implementation • Naive implementation too inefficient acquire(l) by thread t For each variable d: if (l  LS(d))  (add t to LS(d)) Implementation features • Short-circuit checks before lockset computation • Handle thread-locality, unique protecting lock,... • Lazy evaluation of locksets • Apply update rules at only variable access • Keep synchronization actions in a global event list • Order of events consistent with p and sw • Implicit, shared representation of locksets • Use temporary locksets only at access

  15. T1, acquire, l Global event list T2, vol-write, v T1, release, l T2, acquire, l T1, vol-read, v T2, release, l Implementation in Kaffe • In the Kaffe Virtual Machine[http://www.kaffe.org] • Clean room implementation of JVM in C • Full Java platform functionality • Instrumented byte-code interpreter • Functions executing instructions for synchronization, heap access • Per thread checking • Each thread checks its own actions • Communication via global event list • Applicable to multiprocessors Handle-Action (Thread t, Action ) IF  is a synchronization action Add  to the global event list ELSE IF  is an access to variable d IF all short-circuit checks fail Apply-Lockset-Rules(t, d)

  16. Short-circuit checks • Sufficient, constant time checks for H-B • If any of them succeed: No race  No need for lockset computation • Track owner thread • For each variable d, keep the last accessor thread • owner-thread(d): Current accessor thread • Succeeds when d remains thread-local • Track single unique lock • For each variable d, guess a unique protecting lock • single-lock(d): Random lock held by current accessor thread • Succeeds as long as d is accessed while holding same lock

  17. a := IntBox() T1, alloc, o1 T1, alloc, o1 T1, alloc, o1 b := IntBox() a := IntBox() a := IntBox() T1 acquire(L1) b := IntBox() b := IntBox() a.x ++ T1 acquire(L1) T1 acquire(L1) T1, alloc, o2 T1, alloc, o2 T1, alloc, o2 release(L1) a.x ++ a.x ++ T1, acquire, L1 o1.x T1, acquire, L1 T1, acquire, L1 Initialize LS(o1.x) = { T1 } Repeat Apply lockset rules on LS(o1.x) Until last synchronization action by T3 Check whether T3  LS(o1.x) acquire(L1) release(L1) T1, release, L1 T1, release, L1 Garbage collect unreferenced events acquire(L2) acquire(L1) T2, acquire, L1 T2, acquire, L1 tmp:= a acquire(L2) T2, acquire, L2 T2, acquire, L2 T2 a := b tmp:= a T2, release, L1 T2, release, L1 b := tmp T2 a := b T2, release, L2 T2, release, L2 release(L1) b := tmp T3, acquire, L2 T3, acquire, L2 release(L2) release(L1) T3, release, L2 acquire(L2) release(L2) T3 b.x ++ acquire(L2) T3 release(l) b.x ++ Lazy evaluation of locksets

  18. Outline • The Goldilocks algorithm • Implementation • Evaluation • Conclusions

  19. Evaluation • Algorithms evaluated • Goldilocks • Eraser with state machines • Vector-clocks Benchmarks • Microbenchmarks: Interesting, artificial programs • Multiset: Well-protected insertions, deletions, lookups of integers • SharedSpot: Contains variables each protected by a unique lock • LocalSpot: Contains thread-local variables • Larger programs for performance comparison • Raja, SciMark, Grande

  20. Microbenchmarks Interesting cases: Thread-locality, variables protected by single unique locks Short-circuit checks work Per-access cost increasesvery slowly with # of threads

  21. Large benchmarks • Goldilocks much faster than vector clocks • Performance comparable to Eraser • Precision comes at little or no extra cost

  22. Conclusions • The Goldilocks algorithm: A precise lockset-based characterization of the happens-before relation • Sound: Detects all apparent races • Precise: No false alarms • Efficient: Short-circuit checks + Lazy evaluation • Handles all synchronization disciplines uniformly • Thread-locality, dynamically changing locksets,volatile variable-based synchronization, ... • Applicable to both model checking & runtime monitoring • Future work • Dynamic & static methods based on Goldilocks • Tolerable cost for continuous runtime monitoring • Tight integration of static methods and Goldilocks

More Related