1.03k likes | 1.04k Views
This article discusses the concept of data races in multi-threaded programs and explores methods for detecting and preventing them. It covers topics such as the complexity of data race detection, dynamic methods for detection, and the use of locks for synchronization. The article also includes examples and references for further reading.
E N D
Dynamic Data-Race Detection inLock-Based Multi-Threaded Programs
Table of Contents • What is a Data-Race? • Why Data-Races are Undesired? • How Data-Races Can be Prevented? • Can Data-Races be Easily Detected? • Feasible and Apparent Data-Races • Complexity of Data-Race Detection • Program Execution Model • Complexity of Computing Ordering Relations • Proof of NP/Co-NP Hardness
Table of ContentsCont. • So How Data-Races Can be Detected? • Lamport’s Happens-Before Approximation • Approaches to Detection of Apparent Data-Races: • Static Methods • Dynamic Methods: • Post-Mortem Methods • On-The-Fly Methods
Table of ContentsCont. • Closer Look at Dynamic Methods: • DJIT • Local Time Frames • Vector Time Frames • Predicate for Data-Race Detection • Which Accesses to Check? • Which Time Frames to Check? • Access History • First Data-Race • Results
Table of ContentsCont. • Lockset • Locking Discipline • The Basic Algorithm • Improving Locking Discipline • Initialization • Read-Sharing • Refinement for Read-Write Locks • False Alarms • Results • Summary • References
What is a Data-Race? • Data-race is an anomaly of concurrent accesses by two or more threads to a shared variable and at least one is for writing. • Example (variable X is global and shared): Thread 1Thread 2 X=1 T=Y Z=2 T=X
Why Data-Races areUndesired? • Programs which contain data-races usually demonstrate unexpected and even non-deterministic behavior. • The outcome might depend on specific execution order (A.K.A threads’ interleaving). • Re-running the program may not always produce the same results. • Thus, hard to debug and hard to write correct programs.
Why Data-Races areUndesired? - Example • First Interleaving: Thread 1Thread 2 1. X=0 2. T=X 3. X++ • Second Interleaving: Thread 1Thread 2 1. X=0 2. X++ 3. T=X • T==0 or T==1?
T1 T2 Time Execution Order • Each thread has a different execution speed, which may change over time. • For an external observer of the time axis, instructions’ execution is ordered in execution order. • Any order is legal. • Execution order for a single thread is called program order.
How Data-Races Can be Prevented? – Explicit Synchronization • Idea: In order to prevent undesired concurrent accesses to shared locations, we must explicitly synchronize between threads. • The means for explicit synchronization are: • Locks, Mutexes and Critical Sections • Barriers • Binary Semaphores and Counting Semaphores • Monitors • Single-Writer/Multiple-Readers (SWMR) Locks • Others
Synchronization –“Bad” Bank Account Example Thread 1Thread 2 Deposit( amount ) { Withdraw( amount ) { balance+=amount; if (balance<amount); } print( “Error” ); else balance–=amount; } • ‘Deposit’ and ‘Withdraw’ are not “atomic”!!! • What is the final balance after a series of concurrent deposits and withdraws?
Synchronization –“Good” Bank Account Example Thread 1Thread 2 Deposit( amount ) { Withdraw( amount ) { Lock( m );Lock( m ); balance+=amount; if (balance<amount) Unlock( m ); print( “Error” ); } else balance–=amount; Unlock( m ); } • Since critical sections can never execute concurrently, this version exhibits no data-races. Critical Sections
Is This Enough? • Is This Enough? • Theoretically – YES. • Practically – NO. • What if programmer accidentally forgets to place correct synchronization? • How all such data-race bugs can be detected in large program?
Can Data-Races be Easily Detected? – No! • Unfortunately, the problem of deciding whether a given program contains potential data-races is computationally hard!!! • There are a lot of execution orders. For t threads of n instructions each the number of possible orders is about tn*t. • In addition to all different schedulings, all possible inputs should be tested as well. • To compound the problem, inserting a detection code in a program can perturb its execution schedule enough to make all errors disappear.
Feasible Data-Races • Feasible Data-Races: races that are based on the possiblebehavior of the program (i.e. semantics of the program’s computation). • These are the actual (!) data-races that can possibly happen in any specific execution. • Locating feasible data-races requires full analyzing of the program’s semantics to determine if the execution could have alloweda and b (accesses to same shared variable) to execute concurrently.
Apparent Data-Races • Apparent Data-Races: approximations (!) of feasible data-races that are based on only the behavior of the explicit synchronization performed by some feasible execution (and not the semantics of the program’s computation, i.e. ignoring all conditional statements). • Important, since data-races are usually a result of improper synchronization. Thus easier to detect, but less accurate.
Apparent Data-Races Cont. • For example, a and b, accesses to same shared variable in some execution, are said to be ordered, if there is a chain of corresponding explicit synchronization events between them. • Similarly, a and b are said to have potentially executed concurrently if no explicit synchronization prevented them from doing so.
Feasible vs. ApparentExample 1 Thread 1 [Ffalse]Thread 2 X++; F=true; while (F==false) {}; X– –; • Apparent data-races in the execution above – 1 & 2 (no synchronization chain between racing accesses) • Feasible data-race – 1 only!!! – No feasible execution exists, in which ‘X--’ is performed before ‘X++’ (suppose F is false at start). • Note that protecting ‘F’ only will protect X as well. 1 2
Feasible vs. Apparent Example 2 Thread 1 [Ffalse]Thread 2 X++; while( 1 ) { Lock( m );Lock( m ); F=true; if ( F == true ) break; Unlock( m );Unlock( m ); } X– –; • No feasible or apparent data-races exist under any execution order!!! • F is protected by means of lock. The accesses to X are always ordered and properly synchronized.
Complexity ofData-Race Detection • Exactly locating the feasible data-races is an NP-hard problem. Thus, the apparent races, which are simpler to locate, must be detected for debugging. • Fortunately, apparent data-races exist if and only if at least one feasible data-race exists somewhere in the execution. • Yet, the problem of exhaustively locating all apparent data-races still remains NP-hard.
Reminder: NP and Co-NP • There is a set of NP problems for which: • There is no polynomial solution. • There is an exponential solution. • Problem is NP-hard if there is a polynomial reduction from any of the problems in NP to this problem. Problem is NP-complete, if in addition it resides in NP. • Intuitively - if the answer for the problem can be only ‘yes’/‘no’ we can either answer ‘yes’ and stop, or never stop (at least not in polynomial time).
Reminder: NP and Co-NP Cont. • There is also a set of Co-NP problems which is complementary to set of NP problems. • For Co-NP-hard problem with answers ‘yes’ or ‘no’, we can only answer ‘no’. • If problem is both in NP and Co-NP, then it’s in P (i.e. there is a polynomial solution). • The problem of checking whether a boolean formula is satisfiable is NP-complete (answer ‘yes’ if satisfiable assignment for variables was found). • Same, but not-satisfiable – Co-NP-complete.
Why Data-Race Detectionis NP-Hard? • How can we know that in a program P two accesses, a and b, to the same shared variable are concurrent? • Intuitively – we must check all execution orders of P and see. If we discover an execution order, in which a and b are concurrent, we can report on data-race and stop. Otherwise we should continue checking.
Program Execution Model • Consider a class of multi-threaded programs that synchronize by counting semaphores. • Program execution is described by collection of events and two relations over the events. • Synchronization event – instance of some synchronization operation (e.g. signal, wait). • Computation event – instance of a group of statements in same thread, none of which are synchronization operations (e.g. x=x+1).
Program Execution Model –Events’ Relations • Temporal orderingrelation – aT→ b means that a completes before b begins (i.e. last action of a can affect first action of b). • Shared data dependence relation - aD→b means that a accessesa shared variable that b later accesses and at least one of the accesses is a modification to variable. Indicates when one event causally affects another.
Program Execution Model –Program Execution • Program executionP – a triple <E,T→,D→>, where E is a finite set of events, and T→ and D→ are the above relations that satisfy the following axioms: • A1: T→ is an irreflexive partial order (a T↛ a). • A2: If a T→b T↮ c T→ d then a T→ d. • A3: If a D→ b then b T↛ a. • Notes: • ↛ is a shorthand for ¬(a→b). • ↮ is a shorthand for ¬(a→b)⋀¬(b→a). • Notice that A1 and A2 imply transitivity of T→ relation
Program Execution Model –Feasible Program Execution • Feasible program execution for P – execution of a program that performs exactly the same events as P, but may exhibit different temporal ordering. • Definition: P’=<E’,T’→,D’→> is a feasible program execution for P=<E,T→,D→> (potentially occurred) if • F1: E’=E (i.e. exactly the same events), and • F2: P’ satisfies the axioms A1 - A3 of the model, and • F3: a D→ b ⇒ a D’→ b (i.e. same data dependencies) • Note: Any execution that exhibits the same shared-data dependencies as P will execute exactly the same events as P.
Program Execution Model –Ordering Relations • Given a program execution, P=<E,T→,D→>, and the set, F(P), of feasible program executions for P, the following relations (that summarize the temporal orderings present in the feasible program executions) are defined:
Program Execution Model –Ordering Relations - Explanation • The must-have relations describe orderings that are guaranteed to be present in all feasible program executions in F(P). • The could-have relations describe orderings that could potentially occur in at least one of the feasible program executions in F(P). • The happened-before relations show events that execute in a specific order, the concurrent-with relations show events that execute concurrently, and the ordered-with relations show events that execute in either order but not concurrently.
Complexity of Computing Ordering Relations • The problem of computing any of the must-have ordering relations (MHB, MCW, MOW) is Co-NP-hard and the problem of computing any of the could-have relations (CHB, CCW, COW) is NP-hard. • Theorem 1: Given a program execution, P=<E,T→,D→>, that uses counting semaphores, the problem of deciding whether a MHB→ b, a MCW↔ b or a MOW↔ b (any of the must-have orderings) is Co-NP-hard.
Proof of Theorem 1 –Notes • The presented proof is only for the must-have-happened-before (MHB) relation. Proofs for the other relations are analogous. • The proof is a reduction from 3CNFSAT such that any boolean formula is not satisfiable iff a MHB→ b for two events, a and b, defined in the reduction. • The problem of checking whether 3CNFSAT formula is not satisfiable is Co-NP-complete. • The proof can also be extended to programs that use binary semaphores, event style synchronization and other synchronization primitives (and even single counting semaphore).
Proof of Theorem 1 –3CNFSAT • An instance of 3CNFSAT is given by: • A set of n variables, V={X1,X2, …,Xn}. • A boolean formula B consisting of conjunction of m clauses, B=C1⋀C2⋀…⋀Cm. • Each clause Cj=(L1⋁L2⋁L3) is a disjunction of three literals. • Each literal Lk is any variable from V or its negation - Lk=Xi or Lk=⌐Xi. • Example: B=(X1⋁X2⋁⌐X3)⋀(⌐X2⋁⌐X5⋁X6)⋀(X1⋁X4⋁⌐X5)
Proof of Theorem 1 –Idea of the Proof • Given an instance of 3CNFSAT formula, B, we construct a program consisting of 3n+3m+2 threads which use 3n+m+1 semaphores (assumed to be initialized to 0). • The execution of this program simulates a nondeterministic evaluation of B. • Semaphores are used to represent the truth values of each variable and clause. • The execution exhibits certain orderings iff B is not satisfiable.
wait( Ai ) signal( Xi ) . . signal( Xi ) wait( Ai ) signal( not-Xi ) . . signal( not-Xi ) signal( Ai ) wait( Pass2 ) signal( Ai ) Proof of Theorem 1 –The Construction per Variable • For each variable, Xi, the following three threads are constructed: • “. . .” indicates as many signal(Xi) (or signal(not-Xi)) operations as the number of occurrences of the literal Xi (or ⌐Xi) in the formula B.
Proof of Theorem 1 –The Construction per Variable • The semaphores Xi and not-Xi are used to represent the truth value of variable Xi. • Signaling the semaphore Xi (or not-Xi) represents the assignment of True (or False) to variable Xi. • The assignment is accomplished by allowing either signal(Xi) or signal(not-Xi) to proceed, but not both (due to concurrent wait(Ai) operations in two leftmost threads).
wait( L1 ) signal( Cj ) wait( L2 ) signal( Cj ) wait( L3 ) signal( Cj ) Proof of Theorem 1 –The Construction per Clause • For each clause, Cj, the following three threads are constructed: • L1, L2 and L3 are the semaphores corresponding to literals in clause Cj (i.e. Xi or not-Xi). • The semaphore Cj represents the truth value of clause Cj. It is signaled iff the truth assignments to variables, cause the clause Cj to evaluate to True.
Proof of Theorem 1 –Explanation of Construction • The first 3n threads operate in two phases: • The first pass is a non-deterministic guessing phase in which each variable used in the boolean formula B is assigned a unique truth value. Only one of the Xi and not-Xi semaphores is signaled. • The second pass, which begins after semaphore Pass2 is signaled, is used to ensure that the program doesn’t deadlock – the semaphore operations that were not allowed to execute during the first pass are allowed to proceed.
wait( C1 ) . . wait( Cm ) b: skip a: skip signal( Pass2 ) . . signal( Pass2 ) m n Proof of Theorem 1 –The Final Construction • Additional two threads are created: • There are n ‘signal(Pass2)’ operations – one for each variable. • There are m ‘wait(Cj)’ operations – one for each clause.
Proof of Theorem 1 –Putting All Together • Event bis reached only after semaphore Cj,for each clause j, has been signaled. • Since the program contains no conditional statements or shared variables, every execution of the program executes the same events and exhibits the same shared-data dependencies (i.e. none). • Claim: For any execution a MHB→ b iff B is not satisfiable.
Proof of Theorem 1 –Proving the “if” Part • Assume that B is not satisfiable. • Then there is always some clause, Cj, that is not satisfied by the truth values guessed during the first pass. Thus, no signal(Cj) operation is performed during the first pass. • Event b can’t execute until this signal(Cj) operation is performed, which can then only be done during the second pass. • The second pass doesn’t occur until after event a executes, so event a must precede event b. • Therefore, a MHB→ b.
Proof of Theorem 1 –Proving the “only if” Part • Assume that a MHB→ b. • This means that there is no execution in which b either precedes a or executes concurrently with a. • Assume by way of contradiction that B is satisfiable. • Then some truth assignment can be guessed during the first pass that satisfies all of the clauses. • Event b can then execute before event a, contradicting the assumption. • Therefore, B is not satisfiable.
Complexity of Computing Ordering Relations – Cont. • Since a MHB→ b iff B is not satisfiable, the problem of deciding a MHB→ b is Co-NP-hard. • By similar reductions, programs can be constructed such that the non-satisfiability of B can be determined from the MCW or MOW relations. The problem of deciding these relations is therefore also Co-NP-hard. • Theorem 2: Given a program execution, P=<E,T→,D→>, that uses counting semaphores, the problem of deciding whether a CHB→ b, a CCW↔ b or a COW↔ b (any of the could-have orderings) is NP-hard. • Proof by similar reductions …
Complexity of Race Detection -Conditions, Loops and Input • The presented model is too simplistic. • What if conditional statements, like “if” and “while”, are used? What if an input from user is allowed? • If Y≥0 there is a data-race on X. Otherwise it is not possible, since ‘X--’ is never reached.
Complexity of Race Detection -“NP-Harder”? • The proof above does not use conditional statements, loops or input from outside. • This suggests that the problem of data-race detection may be even harder than deciding an NP-complete problem. • With loops and recursion, we do not know whether potentially concurrent accesses will indeed be executed, so the question becomes equivalent to the halting problem. • Thus, in general case, race detection is undecidable.
So How Data-Races Can be Detected? – Approximations • Since it is intractable problem to decide whether a CHB→ b or a CCW↔ b (needed to detect feasible data-races), the temporal ordering relation T→ should be approximated and apparent data-races located instead. • Recall that apparent data-races exist if and only if at least one feasible race exists. • Yet, it remains a hard problem to locate all apparent data-races.
Approximation Example – Lamport’s Happens-Before • The happens-before partial order, denoted hb→, is defined for access events (reads, writes, releases and acquires) that happen in a specific execution, as follows: • Program Order: If a and b are events performed by the same thread, with a preceding b in program order, then ahb→ b. • Release and Acquire: Let a be a release and b be an acquire. If a and b take part in the same synchronization event, then ahb→ b. • Transitivity: If ahb→ b and bhb→ c, then ahb→ c. • Shared accesses a and b are concurrent (denoted by ahb↮ b) if neither ahb→ b nor bhb→ a holds.
Approaches to Detection ofApparent Data-Races – Static There are two main approaches to detection of apparent data-races (sometimes a combination of both is used): • Static Methods – perform a compile-time analysis of the code. – Too conservative. Can’t know or understand the semantics of the program. Result in excessive number of false alarms that hide the real data-races. + Test the program globally – see the full code of the tested program and can warn about all possible errors in all possible executions.
Approaches to Detection ofApparent Data-Races – Dynamic • Dynamic Methods – use tracing mechanism to detect whether a particular execution of a program actually exhibited data-races. + Detect only those apparent data-races that occur during a feasible execution. – Test the program locally - consider only one specific execution path of the program each time. • Post-Mortem Methods – after the execution terminates, analyze the trace of the run and warn about possible data-races that were found. • On-The-Fly Methods – buffer partial trace information in memory, analyze it and detect races as they occur.
Approaches to Detection ofApparent Data-Races • No “silver bullet” exists. • The accuracy is of great importance (especially in large programs). • Yet, there is always a tradeoff between the amount of false positives (undetected races) and false negatives (false alarms). • The space and time overheads imposed by the techniques are significant as well.
Closer Look atDynamic Methods • We will see two dynamic methods for on-the-fly detection of apparent data-races in lock-based multi-threaded programs: • DJIT – based on Lamport’s happens-beforepartial order relation and Mattern’s virtual time (vector clocks). Implemented in Millipede and Multipage systems. • Lockset – based on locking discipline and locksetrefinement. Implemented in Eraser tool.