1.04k likes | 1.25k Views
Dynamic Data-Race Detection in Lock-Based Multi-Threaded Programs. Prepared by Eli Pozniansky under Supervision of Prof. Assaf Schuster. Table of Contents. What is a Data-Race? Why Data-Races are Undesired? How Data-Races Can be Prevented? Can Data-Races be Easily Detected?
E N D
Dynamic Data-Race Detection inLock-Based Multi-Threaded Programs Prepared by Eli Pozniansky under Supervision of Prof. Assaf Schuster
Table of Contents • What is a Data-Race? • Why Data-Races are Undesired? • How Data-Races Can be Prevented? • Can Data-Races be Easily Detected? • Feasible and Apparent Data-Races • Complexity of Data-Race Detection • Program Execution Model • Complexity of Computing Ordering Relations • Proof of NP/Co-NP Hardness
Table of ContentsCont. • So How Data-Races Can be Detected? • Lamport’s Happens-Before Approximation • Approaches to Detection of Apparent Data-Races: • Static Methods • Dynamic Methods: • Post-Mortem Methods • On-The-Fly Methods
Table of ContentsCont. • Closer Look at Dynamic Methods: • DJIT • Local Time Frames • Vector Time Frames • Predicate for Data-Race Detection • Which Accesses to Check? • Which Time Frames to Check? • Access History • First Data-Race • Results
Table of ContentsCont. • Lockset • Locking Discipline • The Basic Algorithm • Improving Locking Discipline • Initialization • Read-Sharing • Refinement for Read-Write Locks • False Alarms • Results • Summary • References
What is a Data-Race? • Data-race is an anomaly of concurrent accesses by two or more threads to a shared variable and at least one is for writing. • Example (variable X is global and shared): Thread 1Thread 2 X=1 T=Y Z=2 T=X
Why Data-Races areUndesired? • Programs which contain data-races usually demonstrate unexpected and even non-deterministic behavior. • The outcome might depend on specific execution order (A.K.A threads’ interleaving). • Re-running the program may not always produce the same results. • Thus, hard to debug and hard to write correct programs.
Why Data-Races areUndesired? - Example • First Interleaving: Thread 1Thread 2 1. X=0 2. T=X 3. X++ • Second Interleaving: Thread 1Thread 2 1. X=0 2. X++ 3. T=X • T==0 or T==1?
T1 T2 Time Execution Order • Each thread has a different execution speed, which may change over time. • For an external observer of the time axis, instructions’ execution is ordered in execution order. • Any order is legal. • Execution order for a single thread is called program order.
How Data-Races Can be Prevented? – Explicit Synchronization • Idea: In order to prevent undesired concurrent accesses to shared locations, we must explicitly synchronize between threads. • The means for explicit synchronization are: • Locks, Mutexes and Critical Sections • Barriers • Binary Semaphores and Counting Semaphores • Monitors • Single-Writer/Multiple-Readers (SWMR) Locks • Others
Synchronization –“Bad” Bank Account Example Thread 1Thread 2 Deposit( amount ) { Withdraw( amount ) { balance+=amount; if (balance<amount); } print( “Error” ); else balance–=amount; } • ‘Deposit’ and ‘Withdraw’ are not “atomic”!!! • What is the final balance after a series of concurrent deposits and withdraws?
Synchronization –“Good” Bank Account Example Thread 1Thread 2 Deposit( amount ) { Withdraw( amount ) { Lock( m );Lock( m ); balance+=amount; if (balance<amount) Unlock( m ); print( “Error” ); } else balance–=amount; Unlock( m ); } • Since critical sections can never execute concurrently, this version exhibits no data-races. Critical Sections
Is This Enough? • Is This Enough? • Theoretically – YES. • Practically – NO. • What if programmer accidentally forgets to place correct synchronization? • How all such data-race bugs can be detected in large program?
Can Data-Races be Easily Detected? – No! • Unfortunately, the problem of deciding whether a given program contains potential data-races is computationally hard!!! • There are a lot of execution orders. For t threads of n instructions each the number of possible orders is about tn*t. • In addition to all different schedulings, all possible inputs should be tested as well. • To compound the problem, inserting a detection code in a program can perturb its execution schedule enough to make all errors disappear.
Feasible Data-Races • Feasible Data-Races: races that are based on the possiblebehavior of the program (i.e. semantics of the program’s computation). • These are the actual (!) data-races that can possibly happen in any specific execution. • Locating feasible data-races requires full analyzing of the program’s semantics to determine if the execution could have alloweda and b (accesses to same shared variable) to execute concurrently.
Apparent Data-Races • Apparent Data-Races: approximations (!) of feasible data-races that are based on only the behavior of the explicit synchronization performed by some feasible execution (and not the semantics of the program’s computation, i.e. ignoring all conditional statements). • Important, since data-races are usually a result of improper synchronization. Thus easier to detect, but less accurate.
Apparent Data-Races Cont. • For example, a and b, accesses to same shared variable in some execution, are said to be ordered, if there is a chain of corresponding explicit synchronization events between them. • Similarly, a and b are said to have potentially executed concurrently if no explicit synchronization prevented them from doing so.
Feasible vs. ApparentExample 1 Thread 1 [Ffalse]Thread 2 X++; F=true; while (F==false) {}; X– –; • Apparent data-races in the execution above – 1 & 2 (no synchronization chain between racing accesses) • Feasible data-race – 1 only!!! – No feasible execution exists, in which ‘X--’ is performed before ‘X++’ (suppose F is false at start). • Note that protecting ‘F’ only will protect X as well. 1 2
Feasible vs. Apparent Example 2 Thread 1 [Ffalse]Thread 2 X++; while( 1 ) { Lock( m );Lock( m ); F=true; if ( F == true ) break; Unlock( m );Unlock( m ); } X– –; • No feasible or apparent data-races exist under any execution order!!! • F is protected by means of lock. The accesses to X are always ordered and properly synchronized.
Complexity ofData-Race Detection • Exactly locating the feasible data-races is an NP-hard problem. Thus, the apparent races, which are simpler to locate, must be detected for debugging. • Fortunately, apparent data-races exist if and only if at least one feasible data-race exists somewhere in the execution. • Yet, the problem of exhaustively locating all apparent data-races still remains NP-hard.
Reminder: NP and Co-NP • There is a set of NP problems for which: • There is no polynomial solution. • There is an exponential solution. • Problem is NP-hard if there is a polynomial reduction from any of the problems in NP to this problem. Problem is NP-complete, if in addition it resides in NP. • Intuitively - if the answer for the problem can be only ‘yes’/‘no’ we can either answer ‘yes’ and stop, or never stop (at least not in polynomial time).
Reminder: NP and Co-NP Cont. • There is also a set of Co-NP problems which is complementary to set of NP problems. • For Co-NP-hard problem with answers ‘yes’ or ‘no’, we can only answer ‘no’. • If problem is both in NP and Co-NP, then it’s in P (i.e. there is a polynomial solution). • The problem of checking whether a boolean formula is satisfiable is NP-complete (answer ‘yes’ if satisfiable assignment for variables was found). • Same, but not-satisfiable – Co-NP-complete.
Why Data-Race Detectionis NP-Hard? • How can we know that in a program P two accesses, a and b, to the same shared variable are concurrent? • Intuitively – we must check all execution orders of P and see. If we discover an execution order, in which a and b are concurrent, we can report on data-race and stop. Otherwise we should continue checking.
Program Execution Model • Consider a class of multi-threaded programs that synchronize by counting semaphores. • Program execution is described by collection of events and two relations over the events. • Synchronization event – instance of some synchronization operation (e.g. signal, wait). • Computation event – instance of a group of statements in same thread, none of which are synchronization operations (e.g. x=x+1).
Program Execution Model –Events’ Relations • Temporal orderingrelation – aT→ b means that a completes before b begins (i.e. last action of a can affect first action of b). • Shared data dependence relation - aD→b means that a accessesa shared variable that b later accesses and at least one of the accesses is a modification to variable. Indicates when one event causally affects another.
Program Execution Model –Program Execution • Program executionP – a triple <E,T→,D→>, where E is a finite set of events, and T→ and D→ are the above relations that satisfy the following axioms: • A1: T→ is an irreflexive partial order (a T↛ a). • A2: If a T→b T↮ c T→ d then a T→ d. • A3: If a D→ b then b T↛ a. • Notes: • ↛ is a shorthand for ¬(a→b). • ↮ is a shorthand for ¬(a→b)⋀¬(b→a). • Notice that A1 and A2 imply transitivity of T→ relation
Program Execution Model –Feasible Program Execution • Feasible program execution for P – execution of a program that performs exactly the same events as P, but may exhibit different temporal ordering. • Definition: P’=<E’,T’→,D’→> is a feasible program execution for P=<E,T→,D→> (potentially occurred) if • F1: E’=E (i.e. exactly the same events), and • F2: P’ satisfies the axioms A1 - A3 of the model, and • F3: a D→ b ⇒ a D’→ b (i.e. same data dependencies) • Note: Any execution that exhibits the same shared-data dependencies as P will execute exactly the same events as P.
Program Execution Model –Ordering Relations • Given a program execution, P=<E,T→,D→>, and the set, F(P), of feasible program executions for P, the following relations (that summarize the temporal orderings present in the feasible program executions) are defined:
Program Execution Model –Ordering Relations - Explanation • The must-have relations describe orderings that are guaranteed to be present in all feasible program executions in F(P). • The could-have relations describe orderings that could potentially occur in at least one of the feasible program executions in F(P). • The happened-before relations show events that execute in a specific order, the concurrent-with relations show events that execute concurrently, and the ordered-with relations show events that execute in either order but not concurrently.
Complexity of Computing Ordering Relations • The problem of computing any of the must-have ordering relations (MHB, MCW, MOW) is Co-NP-hard and the problem of computing any of the could-have relations (CHB, CCW, COW) is NP-hard. • Theorem 1: Given a program execution, P=<E,T→,D→>, that uses counting semaphores, the problem of deciding whether a MHB→ b, a MCW↔ b or a MOW↔ b (any of the must-have orderings) is Co-NP-hard.
Proof of Theorem 1 –Notes • The presented proof is only for the must-have-happened-before (MHB) relation. Proofs for the other relations are analogous. • The proof is a reduction from 3CNFSAT such that any boolean formula is not satisfiable iff a MHB→ b for two events, a and b, defined in the reduction. • The problem of checking whether 3CNFSAT formula is not satisfiable is Co-NP-complete. • The proof can also be extended to programs that use binary semaphores, event style synchronization and other synchronization primitives (and even single counting semaphore).
Proof of Theorem 1 –3CNFSAT • An instance of 3CNFSAT is given by: • A set of n variables, V={X1,X2, …,Xn}. • A boolean formula B consisting of conjunction of m clauses, B=C1⋀C2⋀…⋀Cm. • Each clause Cj=(L1⋁L2⋁L3) is a disjunction of three literals. • Each literal Lk is any variable from V or its negation - Lk=Xi or Lk=⌐Xi. • Example: B=(X1⋁X2⋁⌐X3)⋀(⌐X2⋁⌐X5⋁X6)⋀(X1⋁X4⋁⌐X5)
Proof of Theorem 1 –Idea of the Proof • Given an instance of 3CNFSAT formula, B, we construct a program consisting of 3n+3m+2 threads which use 3n+m+1 semaphores (assumed to be initialized to 0). • The execution of this program simulates a nondeterministic evaluation of B. • Semaphores are used to represent the truth values of each variable and clause. • The execution exhibits certain orderings iff B is not satisfiable.
wait( Ai ) signal( Xi ) . . signal( Xi ) wait( Ai ) signal( not-Xi ) . . signal( not-Xi ) signal( Ai ) wait( Pass2 ) signal( Ai ) Proof of Theorem 1 –The Construction per Variable • For each variable, Xi, the following three threads are constructed: • “. . .” indicates as many signal(Xi) (or signal(not-Xi)) operations as the number of occurrences of the literal Xi (or ⌐Xi) in the formula B.
Proof of Theorem 1 –The Construction per Variable • The semaphores Xi and not-Xi are used to represent the truth value of variable Xi. • Signaling the semaphore Xi (or not-Xi) represents the assignment of True (or False) to variable Xi. • The assignment is accomplished by allowing either signal(Xi) or signal(not-Xi) to proceed, but not both (due to concurrent wait(Ai) operations in two leftmost threads).
wait( L1 ) signal( Cj ) wait( L2 ) signal( Cj ) wait( L3 ) signal( Cj ) Proof of Theorem 1 –The Construction per Clause • For each clause, Cj, the following three threads are constructed: • L1, L2 and L3 are the semaphores corresponding to literals in clause Cj (i.e. Xi or not-Xi). • The semaphore Cj represents the truth value of clause Cj. It is signaled iff the truth assignments to variables, cause the clause Cj to evaluate to True.
Proof of Theorem 1 –Explanation of Construction • The first 3n threads operate in two phases: • The first pass is a non-deterministic guessing phase in which each variable used in the boolean formula B is assigned a unique truth value. Only one of the Xi and not-Xi semaphores is signaled. • The second pass, which begins after semaphore Pass2 is signaled, is used to ensure that the program doesn’t deadlock – the semaphore operations that were not allowed to execute during the first pass are allowed to proceed.
wait( C1 ) . . wait( Cm ) b: skip a: skip signal( Pass2 ) . . signal( Pass2 ) m n Proof of Theorem 1 –The Final Construction • Additional two threads are created: • There are n ‘signal(Pass2)’ operations – one for each variable. • There are m ‘wait(Cj)’ operations – one for each clause.
Proof of Theorem 1 –Putting All Together • Event bis reached only after semaphore Cj,for each clause j, has been signaled. • Since the program contains no conditional statements or shared variables, every execution of the program executes the same events and exhibits the same shared-data dependencies (i.e. none). • Claim: For any execution a MHB→ b iff B is not satisfiable.
Proof of Theorem 1 –Proving the “if” Part • Assume that B is not satisfiable. • Then there is always some clause, Cj, that is not satisfied by the truth values guessed during the first pass. Thus, no signal(Cj) operation is performed during the first pass. • Event b can’t execute until this signal(Cj) operation is performed, which can then only be done during the second pass. • The second pass doesn’t occur until after event a executes, so event a must precede event b. • Therefore, a MHB→ b.
Proof of Theorem 1 –Proving the “only if” Part • Assume that a MHB→ b. • This means that there is no execution in which b either precedes a or executes concurrently with a. • Assume by way of contradiction that B is satisfiable. • Then some truth assignment can be guessed during the first pass that satisfies all of the clauses. • Event b can then execute before event a, contradicting the assumption. • Therefore, B is not satisfiable.
Complexity of Computing Ordering Relations – Cont. • Since a MHB→ b iff B is not satisfiable, the problem of deciding a MHB→ b is Co-NP-hard. • By similar reductions, programs can be constructed such that the non-satisfiability of B can be determined from the MCW or MOW relations. The problem of deciding these relations is therefore also Co-NP-hard. • Theorem 2: Given a program execution, P=<E,T→,D→>, that uses counting semaphores, the problem of deciding whether a CHB→ b, a CCW↔ b or a COW↔ b (any of the could-have orderings) is NP-hard. • Proof by similar reductions …
Complexity of Race Detection -Conditions, Loops and Input • The presented model is too simplistic. • What if conditional statements, like “if” and “while”, are used? What if an input from user is allowed? If Y≥0 there is a data-race on X. Otherwise it is not possible, since ‘X--’ is never reached.
Complexity of Race Detection -“NP-Harder”? • The proof above does not use conditional statements, loops or input from outside. • This suggests that the problem of data-race detection may be even harder than deciding an NP-complete problem. • With loops and recursion, we do not know whether potentially concurrent accesses will indeed be executed, so the question becomes equivalent to the halting problem. • Thus, in general case, race detection is undecidable.
So How Data-Races Can be Detected? – Approximations • Since it is intractable problem to decide whether a CHB→ b or a CCW↔ b (needed to detect feasible data-races), the temporal ordering relation T→ should be approximated and apparent data-races located instead. • Recall that apparent data-races exist if and only if at least one feasible race exists. • Yet, it remains a hard problem to locate all apparent data-races.
Approximation Example – Lamport’s Happens-Before • The happens-before partial order, denoted hb→, is defined for access events (reads, writes, releases and acquires) that happen in a specific execution, as follows: • Program Order: If a and b are events performed by the same thread, with a preceding b in program order, then ahb→ b. • Release and Acquire: Let a be a release and b be an acquire. If a and b take part in the same synchronization event, then ahb→ b. • Transitivity: If ahb→ b and bhb→ c, then ahb→ c. • Shared accesses a and b are concurrent (denoted by ahb↮ b) if neither ahb→ b nor bhb→ a holds.
Approaches to Detection ofApparent Data-Races – Static There are two main approaches to detection of apparent data-races (sometimes a combination of both is used): • Static Methods – perform a compile-time analysis of the code. – Too conservative. Can’t know or understand the semantics of the program. Result in excessive number of false alarms that hide the real data-races. + Test the program globally – see the full code of the tested program and can warn about all possible errors in all possible executions.
Approaches to Detection ofApparent Data-Races – Dynamic • Dynamic Methods – use tracing mechanism to detect whether a particular execution of a program actually exhibited data-races. + Detect only those apparent data-races that occur during a feasible execution. – Test the program locally - consider only one specific execution path of the program each time. • Post-Mortem Methods – after the execution terminates, analyze the trace of the run and warn about possible data-races that were found. • On-The-Fly Methods – buffer partial trace information in memory, analyze it and detect races as they occur.
Approaches to Detection ofApparent Data-Races • No “silver bullet” exists. • The accuracy is of great importance (especially in large programs). • Yet, there is always a tradeoff between the amount of false positives (undetected races) and false negatives (false alarms). • The space and time overheads imposed by the techniques are significant as well.
Closer Look atDynamic Methods • We will see two dynamic methods for on-the-fly detection of apparent data-races in lock-based multi-threaded programs: • DJIT – based on Lamport’s happens-beforepartial order relation and Mattern’s virtual time (vector clocks). Implemented in Millipede and Multipage systems. • Lockset – based on locking discipline and locksetrefinement. Implemented in Eraser tool.