950 likes | 1.16k Views
On-the-Fly Data-Race Detection in Multithreaded Programs. Prepared by Eli Pozniansky under Supervision of Prof. Assaf Schuster. Table of Contents. What is a Data-Race? Why Data-Races are Undesired? How Data-Races Can be Prevented? Can Data-Races be Easily Detected?
E N D
On-the-Fly Data-RaceDetection inMultithreaded Programs Prepared by Eli Pozniansky under Supervision of Prof. Assaf Schuster
Table of Contents • What is a Data-Race? • Why Data-Races are Undesired? • How Data-Races Can be Prevented? • Can Data-Races be Easily Detected? • Feasible and Apparent Data-Races • Complexity of Data-Race Detection • NP and Co-NP • Program Execution Model & Ordering Relations • Complexity of Computing Ordering Relations • Proof of NP/Co-NP Hardness
Table of ContentsCont. • So How Data-Races Can be Detected? • Lamport’s Happens-Before Approximation • Approaches to Detection of Apparent Data-Races: • Static Methods • Dynamic Methods: • Post-Mortem Methods • On-The-Fly Methods
Table of ContentsCont. • Closer Look at Dynamic Methods: • DJIT+ • Local Time Frames • Vector Time Frames • Logging Mechanism • Data-Race Detection Using Vector Time Frames • Which Accesses to Check? • Which Time Frames to Check? • Access History & Algorithm • Coherency • Results
Table of ContentsCont. • Lockset • Locking Discipline • The Basic Algorithm & Explanation • Which Accesses to Check? • Improving Locking Discipline • Initialization • Read-Sharing • Barriers • False Alarms • Results • Combining DJIT+ and Lockset • Summary • References
What is a Data Race? • Concurrent accesses to a shared location by two or more threads, where at least one is for writing Example (variable X is global and shared): Thread 1Thread 2 X=1 T=Y Z=2 T=X Usually indicative of bug!
Why Data-Races areUndesired? • Programs with data-races: • Usually demonstrate unexpected and even non-deterministic behavior. • The outcome might depend on specific execution order (A.K.A threads’ interleaving). • Re-executing may not always produce the same results/same data-races. • Thus, hard to debug and hard to write correct programs.
Machine code for ‘X++’ Why Data Races areUndesired? – Example • First interleaving: Thread 1Thread 2 1. reg1X 2. incr reg1 3. Xreg1 4. reg2X 5. incr reg2 6. Xreg2 Second interleaving: Thread 1Thread 2 1. reg1X 2. incr reg1 3. reg2X 4. incr reg2 5. Xreg2 6. Xreg1 At the beginning: X=0. At the end: X=1 or X=2? Depends on the scheduling order
T1 T2 Time Execution Order • Each thread has a different execution speed. • The speed may change over time. • For an external observer of the time axis, instructions appear in execution order. • Any order is legal. • Execution order for a single thread is called program order.
How Data Races Can be Prevented? • Explicit synchronization between threads: • Locks • Critical Sections • Barriers • Mutexes • Semaphores • Monitors • Events • Etc. Lock(m) Unlock(m)Lock(m) Unlock(m) Thread 1Thread 2 X++ T=X
Synchronization –“Bad” Bank Account Example Thread 1Thread 2 Deposit( amount ) { Withdraw( amount ) { balance+=amount; if (balance<amount); } print( “Error” ); else balance–=amount; } • ‘Deposit’ and ‘Withdraw’ are not “atomic”!!! • What is the final balance after a series of concurrent deposits and withdraws?
Synchronization –“Good” Bank Account Example Thread 1Thread 2 Deposit( amount ) { Withdraw( amount ) { Lock( m );Lock( m ); balance+=amount; if (balance<amount) Unlock( m ); print( “Error” ); } else balance–=amount; Unlock( m ); } • Since critical sections can never execute concurrently, this version exhibits no data-races. Critical Sections
Is This Enough? • Theoretically – YES • Practically – NO • What if programmer accidentally forgets to place correct synchronization? • How all such data race bugs can be detected in large program? • How to eliminate redundant synchronization?
Can Data Races be Easily Detected? – No! • The problem of deciding whether a given program contains potential data races (called feasible) is NP-hard [Netzer&Miller 1990] • Input size = # instructions performed • Even for 2 threads only • Even with no loops/recursion • Lots of execution orders: (#threads)thread_length*threads • Also all possible inputs should be tested • Side effects of the detection code can eliminate all data races a lock(m) ... unlock(m) lock(m) b unlock(m)
Feasible Data-Races • Based on the possiblebehavior of the program (i.e. semantics of the program’s computation). • The actual (!) data-races that can possibly happen in some program execution. • Require full analyzing of the program’s semantics to determine if the execution could have allowed accesses to same shared variable to execute concurrently.
Apparent Data Races • Approximations of the feasible data races • Based on only the behavior of program explicitsynchronization (and not on program semantics) • Important since data-races are usually result of improper synchronization • Easier to locate • Less accurate • Exist iff at least one feasible data race exists • Exhaustively locating all apparent data races is still NP-hard (and, in fact, undecidable)
Initially: grades = oldDatabase; updated = false; Thread T.A. grades:=newDatabase; updated:=true; Thread Lecturer while (updated == false); X:=grades.gradeOf(lecturersSon); Apparent Data-Races Cont. • Accesses a and b to same shared variable in some execution, are ordered, if there is a chain of corresponding explicit synchronization events between them. • a and b are said to have potentially executedconcurrently if no explicit synchronization prevented them from doing so.
Feasible vs. Apparent Thread 1 [Ffalse]Thread 2 X++ F=true if (F==true) X– – • Apparent data-races in the execution above – 1 & 2. • Feasible data-races – 1 only!!! – No feasible execution exists, in which ‘X--’ is performed before ‘X++’ (suppose ‘F’ is false at start). • Protecting ‘F’ only will protect ‘X’ as well. 1 2
Feasible vs. Apparent Thread 1 [Ffalse]Thread 2 X++ Lock( m ) Lock( m ) T = F F=true Unlock( m ) Unlock( m ) if (T==true) X– – • No feasible or apparent data-races exist under any execution order!!! • ‘F’ is protected by a lock. ‘X++’ and ‘X– –’ are always ordered and properly synchronized. • Rather there is a sync‘ chain of Unlock(m)-Lock(m) between ‘X++’ and ‘X– –’, or only ‘X++’ executes.
Complexity ofData-Race Detection • Exactly locating the feasible data-races is an NP-hard problem. • The apparent races, which are simpler to locate, must be detected for debugging. Apparent data-races exist if and only if at least one feasible data-race exists somewhere in the execution. The problem of exhaustively locating all apparent data-races is still NP-hard.
Reminder: NP and Co-NP • There is a set of NP problems for which: • There is no polynomial solution. • There is an exponential solution. • Problem is NP-hard if there is a polynomial reduction from any of the problems in NP to this problem. • Problem is NP-complete, if it is NP-hard and it resides in NP. • Intuitively - if the answer for the problem can be only ‘yes’/‘no’ we can either answer ‘yes’ and stop, or never stop (at least not in polynomial time).
Reminder: NP and Co-NP Cont. • The set of Co-NP problems is complementary to the set of NP problems. • Problem is Co-NP-hard if we can only answer ‘no’. • If problem is both in NP and Co-NP, then it’s in P (i.e. there is a polynomial solution). • The problem of checking whether a boolean formula is satisfiable is NP-complete. • Answer ‘yes’ if satisfiable assignment for variables was found. • Same, but not-satisfiable – Co-NP-complete.
Why Data-Race Detectionis NP-Hard? • Question: How can we know that in a program P two accesses, a and b, to the same shared variable are concurrent? • Answer: We must check all execution orders of P and see. • If we discover an execution order, in which a and b are concurrent, we can report on data-race and stop. • Otherwise we should continue checking.
Program Execution Model • Consider a class of multi-threaded programs that synchronize by counting semaphores. • Program execution is described by collection of events and two relations over the events. • Synchronization event – instance of some synchronization operation (e.g. signal, wait). • Computation event – instance of a group of statements in same thread, none of which are synchronization operations (e.g. x=x+1).
Program Execution Model –Events’ Relations • Temporal orderingrelation – aT→ b means that a completes before b begins (i.e. last action of a can affect first action of b). • Shared data dependence relation - aD→b means that a accessesa shared variable that b later accesses and at least one of the accesses is a modification to variable. • Indicates when one event causally affects another.
Program Execution Model –Program Execution • Program executionP – a triple <E,T→,D→>, where E is a finite set of events, and T→ and D→ are the above relations that satisfy the following axioms: • A1: T→ is an irreflexive partial order (a T↛ a). • A2: If a T→b T↮ c T→ d then a T→ d. • A3: If a D→ b then b T↛ a. • Notes: • ↛ is a shorthand for ¬(a→b). • ↮ is a shorthand for ¬(a→b)⋀¬(b→a). • Notice that A1 and A2 imply transitivity of T→.
Program Execution Model –Feasible Program Execution • Feasible program execution for P – execution of a program that: • performs exactly the same events as P • May exhibit different temporal ordering. • Definition: P’=<E’,T’→,D’→> is a feasible program execution for P=<E,T→,D→> (potentially occurred) if • F1: E’=E (i.e. exactly the same events), and • F2: P’ satisfies the axioms A1 - A3 of the model, and • F3: a D→ b ⇒ a D’→ b (i.e. same data dependencies) • Note: Any execution with same shared-data dependencies as P will execute exactly the same events as P.
Program Execution Model –Ordering Relations • Given a program execution, P=<E,T→,D→>, and the set, F(P), of feasible program executions for P, the following relations are defined: • Summarize the temporal orderings present in the feasible program executions.
Program Execution Model –Ordering Relations - Explanation • The must-have relations describe orderings that are guaranteed to be present in all feasible program executions in F(P). • The could-have relations describe orderings that could potentially occur in at least one of the feasible program executions in F(P). • The happened-before relations show events that execute in a specific order. • The concurrent-with relations show events that execute concurrently. • The ordered-with relations show events that execute in either order but not concurrently.
Complexity of Computing Ordering Relations • The problem of computing any of the must-have ordering relations (MHB, MCW, MOW) is Co-NP-hard. • The problem of computing any of the could-have relations (CHB, CCW, COW) is NP-hard. • Theorem 1: Given a program execution, P=<E,T→,D→>, that uses counting semaphores, the problem of deciding whether a MHB→ b, a MCW↔ b or a MOW↔ b (any of the must-have orderings) is Co-NP-hard.
Proof of Theorem 1 –Notes • The proof is a reduction from 3CNFSAT such that any boolean formula is not satisfiable iff a MHB→ b for two events, a and b, defined in the reduction. • The problem of checking whether 3CNFSAT formula is not satisfiable is Co-NP-complete. • The presented proof is only for the must-have-happened-before (MHB) relation. • Proofs for the other relations are analogous. • The proof can also be extended to programs that use binary semaphores, event style synchronization and other synchronization primitives (and even single counting semaphore).
Proof of Theorem 1 –3CNFSAT • An instance of 3CNFSAT is given by: • A set of n variables, V={X1,X2, …,Xn}. • A boolean formula B consisting of conjunction of m clauses, B=C1⋀C2⋀…⋀Cm. • Each clause Cj=(L1⋁L2⋁L3) is a disjunction of three literals. • Each literal Lk is any variable from V or its negation - Lk=Xi or Lk=⌐Xi. • Example: B=(X1⋁X2⋁⌐X3)⋀(⌐X2⋁⌐X5⋁X6)⋀(X1⋁X4⋁⌐X5)
Proof of Theorem 1 –Idea of the Proof • Given an instance of 3CNFSAT formula, B, we construct a program consisting of 3n+3m+2 threads which use 3n+m+1 semaphores (assumed to be initialized to 0). • The execution of this program simulates a nondeterministic evaluation of B. • Semaphores are used to represent the truth values of each variable and clause. • The execution exhibits certain orderings iff B is not satisfiable.
wait( Ai ) signal( Xi ) . . signal( Xi ) wait( Ai ) signal( not-Xi ) . . signal( not-Xi ) signal( Ai ) wait( Pass2 ) signal( Ai ) Proof of Theorem 1 –The Construction per Variable • For each variable, Xi, the following three threads are constructed: • “. . .” indicates as many signal(Xi) (or signal(not-Xi)) operations as the number of occurrences of the literal Xi (or ⌐Xi) in the formula B.
Proof of Theorem 1 –The Construction per Variable • The semaphores Xi and not-Xi are used to represent the truth value of variable Xi. • Signaling the semaphore Xi (or not-Xi) represents the assignment of True (or False) to variable Xi. • The assignment is accomplished by allowing either signal(Xi) or signal(not-Xi) to proceed, but not both (due to concurrent wait(Ai) operations in two leftmost threads).
wait( L1 ) signal( Cj ) wait( L2 ) signal( Cj ) wait( L3 ) signal( Cj ) Proof of Theorem 1 –The Construction per Clause • For each clause, Cj, the following three threads are constructed: • L1, L2 and L3 are the semaphores corresponding to literals in clause Cj (i.e. Xi or not-Xi). • The semaphore Cj represents the truth value of clause Cj. It is signaled iff the truth assignments to variables, cause the clause Cj to evaluate to True.
Proof of Theorem 1 –Explanation of Construction • The first 3n threads operate in two phases: • The first pass is a non-deterministic guessing phase in which: • Each variable used in the boolean formula B is assigned a unique truth value. • Only one of the Xi and not-Xi semaphores is signaled. • The second pass (begins after semaphore Pass2 is signaled) is used to ensure that the program doesn’t deadlock: • The semaphore operations that were not allowed to execute during the first pass are allowed to proceed.
wait( C1 ) . . wait( Cm ) b: skip a: skip signal( Pass2 ) . . signal( Pass2 ) m n Proof of Theorem 1 –The Final Construction • Additional two threads are created: • There are n ‘signal(Pass2)’ operations – one for each variable. • There are m ‘wait(Cj)’ operations – one for each clause.
Proof of Theorem 1 –Putting All Together • Event bis reached only after semaphore Cj,for each clause j, has been signaled. • The program contains no conditional statements or shared variables. • Every execution of the program executes the same events and exhibits the same shared-data dependencies (i.e. none). • Claim: For any execution a MHB→ b iff B is not satisfiable.
Proof of Theorem 1 –Proving the “if” Part • Assume that B is not satisfiable. • Then there is always some clause, Cj, that is not satisfied by the truth values guessed during the first pass. Thus, no signal(Cj) operation is performed during the first pass. • Event b can’t execute until this signal(Cj) operation is performed, which can then only be done during the second pass. • The second pass doesn’t occur until after event a executes, so event a must precede event b. • Therefore, a MHB→ b.
Proof of Theorem 1 –Proving the “only if” Part • Assume that a MHB→ b. • This means that there is no execution in which b either precedes a or executes concurrently with a. • Assume by way of contradiction that B is satisfiable. • Then some truth assignment can be guessed during the first pass that satisfies all of the clauses. • Event b can then execute before event a, contradicting the assumption. • Therefore, B is not satisfiable.
Complexity of Computing Ordering Relations – Cont. • Since a MHB→ b iff B is not satisfiable, the problem of deciding a MHB→ b is Co-NP-hard. • By similar reductions, programs can be constructed such that the non-satisfiability of B can be determined from the MCW or MOW relations. The problem of deciding these relations is therefore also Co-NP-hard. • Theorem 2: Given a program execution, P=<E,T→,D→>, that uses counting semaphores, the problem of deciding whether a CHB→ b, a CCW↔ b or a COW↔ b (any of the could-have orderings) is NP-hard. • Proof by similar reductions …
Complexity of Race Detection -Conditions, Loops and Input • The presented model is too simplistic. • What if the “if” and “while” statements are used? What if the user’s input is allowed? If Y≥0 there is a data-race. Otherwise it is not possible, since [1] is never reached.
Complexity of Race Detection -“NP-Harder”? • The proof above does not use conditional statements, loops or input from outside. • The problem of data-race detection is much-much harder then deciding an NP-complete problem. • Intuitively - there is no exponential solution, since it’s not known whether the program will stop. • Thus, in general case, it’s undecidable.
So How Data-Races Can be Detected? – Approximations • Deciding whether a CHB→ b or a CCW↔ b will reveal feasible data-races. • Since it is intractable problem, the temporal ordering relation T→ should be approximated and apparent data-races located instead. • Recall that apparent data-races exist if and only if at least one feasible race exists. • Yet, it remains a hard problem to locate all apparent data-races.
Approximation Example – Lamport’s Happens-Before • The happens-before partial order, denoted ahb→b, is defined for access events (reads, writes, releases and acquires) that happen in a specific execution, as follows: • Shared accesses a and b are concurrent, ahb↮ b, if neither ahb→ b nor bhb→ a holds. • Program Order: a and b are events performed by the same thread, with a preceding b • Release and Acquire: a is a release of a some sync’ object S and b is a corresponding acquire • Transitivity: ahb→c and c hb→b ahb→b
Approaches to Detection ofApparent Data-Races – Static There are two main approaches to detection of apparent data-races (sometimes a combination of the both is used): • Static – perform a compile-time analysis of the code. – Too conservative: • Can’t know or understand the semantics of the program. • Result in excessive false alarms that hide the real data-races. + Test the program globally: • See the whole code of the tested program • Can warn about all possible errors in all possible executions.
Approaches to Detection ofApparent Data-Races – Dynamic • Dynamic – use tracing mechanism to detect whether a particular execution actually exhibited data-races. + Detect only those apparent data-races that actually occur during a feasible execution. – Test the program locally: • Consider only one specific execution path of the program each time. • Post-Mortem Methods – after the execution terminates, analyze the trace of the run and warn about possible data-races that were found. • On-The-Fly Methods – buffer partial trace information in memory, analyze it and detect races as they occur.
Approaches to Detection ofApparent Data-Races • No “silver bullet” exists. • The accuracy is of great importance (especially in large programs). • There is always a tradeoff between the amount of false positives (undetected races) and false negatives (false alarms). • The space and time overheads imposed by the techniques are significant as well.
Closer Look atDynamic Methods • We show two dynamic methods for on-the-fly detection of apparent data-races in multi-threaded programs with locks and barriers: • DJIT+ – based on Lamport’s happens-beforepartial order relation and Mattern’s virtual time (vector clocks). Implemented in Millipede and MultiRace systems. • Lockset – based on locking discipline and locksetrefinement. Implemented in Eraser tool and MultiRace system.