210 likes | 364 Views
Pitfalls in Teaching Development and Testing of Concurrent Programs and How to Overcome them. Eitan Farchi. Objectives of the course I wanted to teach. Background
E N D
Pitfalls in Teaching Development and Testing ofConcurrent Programs and How to Overcome them Eitan Farchi
Objectives of the course I wanted to teach • Background • The process abstraction, mutual exclusion and conditional synchronization, scheduling policies and fairness, the process life cycle, synchronization primitives (semaphores, monitors), message passing, logical time, examples,… • Design the protocol through an abstraction • Use atomic and atomic wait primitives • (c1, s1) s2 => (c1 || c2, s1) (c2, s2) • (c1, s1) s2 => (<c1>, s1) s2 • (b, s1) true and (c, s1) s2 => (<await b c>, s1) s2 • The use of higher abstraction level synchronization primitives lead to • Lower number of possible interleavings • Mistakes are less likely • Design is validated through • Reviewing the important interleavings • Formal reasoning (invariants, proofs, model checking,…) • Higher abstraction level synchronization primitives are correctly translated to lower abstraction level synchronization primitives • For example, an atomic primitive is carefully translated to locks and unlocks • Bug patterns are used to avoid mistakes • The implementation is tested using ConTest • At this stage a good test plan is readily available from the previous development phaes Contest
I thought this course for several years in various formats • To third year computer science students • To professional programmers • With and without experience in development of concurrent programs • At least first degree in computer science • To testers with various degree of programming skills Contest
Real world description of the ticket algorithm (start with something concrete ) • Some stores/government offices employ the following method to ensure that customers are serviced in order of arrival • Upon entering the store, a customer draws a number that is larger than the number held by any other customer • The customer then waits until all customers holding smaller numbers have been serviced • This algorithm is implemented by a number dispenser and by a display indicating which customer is being served • If the store has one employee behind the service counter, customers are served one at a time in their order of arrival Contest
High level implementation of the ticket algorithm var number := 1, next := 1, turn[1:n] := ([n], 0) P[1:1..n]:: do true -> <turn[i] := number, number := number + 1> <await turn[i] == next> critical section <next := next +1> non-critical section od Contest
Mapping of the previous two abstraction levels (real world and high level descriptions) • <turn[i] := number, number := number + 1> // customer obtains a ticket • <await turn[i] = next> // customers wait their turn • <next := next +1> // call for next customer Contest
Testing/validating the protocol • Even if the synchronization primitives are high level there are typically too many interleavings to review • This is addressed by inductive proof, invariants • Assuming process i entered the critical section then • turn[i] == next right after <await turn[i] == next>. • It is easy to prove that turn[i] <> turn[j] if i <> j and turn[i] <> 0 and turn[j] <> 0 • Thus, as long as the critical section is not exited, any process that will reach <await turn[i] == next> will have to wait and • at most one process can enter the critical section. • Students – • “We don’t like mathematics and we don’t like proofs, in fact, we hate them” • “And by the way – the ticket algorithm is ridiculously simple – its only a loop with for lines of code” • Maybe they don’t understand there is an exponential space of possible interleavings? Contest
Objectives of the course - updated • Background • The process abstraction, mutual exclusion and conditional synchronization, scheduling policies and fairness, the process life cycle, synchronization primitives (semaphores, monitors), message passing, logical time, examples,… • Design the protocol through an abstraction • Use atomic and atomic wait primitives • (c1, s1) s2 => (c1 || c2, s1) (c2, s2) • (c1, s1) s2 => (<c1>, s1) s2 • (b, s1) true and (c, s1) s2 => (<await b c>, s1) s2 • The use of higher abstraction level synchronization primitives lead to • Lower number of possible interleavings • Mistakes are less likely • Design is validated through • Systematically represent the set of possible interleavings • Typically through the use of Cartesian product models • Reviewing the important interleavings • Higher abstraction level synchronization primitives are correctly translated to lower abstraction level synchronization primitives • For example, an atomic primitive is carefully translated to locks and unlocks • Bug patterns are used to avoid mistakes • The implementation is tested using ConTest • At this stage a good test plan is readily available from the previous development phases Contest
Helping the students realize that there is an exponential interleaving space • First attempt - counting • The number of possible interleavings is enormous • For (a;b;c;e;f;g)||(h;I;j;k;l;m) of none blocking atomic actions the number of possible traces is 12!/(6!*6!) = 924 • Second attempt – riddles • 100 threads are executing x++ on a shared variable initialized to 0, what are the possible outcomes? • Students – “OK there are many things happening together in parallel and they can occur in many ways – but it is hard, too hard, to think about things happening in parallel” Contest
Serialization helps understand the algorithm (Continued) Contest
Next we implement the protocol • Students – • “Locks are easy to use – no need to read the instructions” Contest
Avoid errors by understanding the synchronization primitives [precise-java] • In Java each object is associated with a lock • Consider the following class class Conflict { Conflict(…){ synchronized(Conflict.class){…}; }; synchronized static void f(…){….}; synchronized void g(…){….}; void h(…){ synchronized(this){….}; }; void r(…){…}; }; • Which of the following pairs of methods when executing concurrently can cause a conflict? • f || g, f || h, f || r, g || h, g || r, h || r • Pairs of the constructor method and one of the other methods Contest
Translating from abstract to concrete - implementation pitfalls are explained • Difference between atomicity and locking • What is the protection provide by • synchronized(o){x++} occurring in parallel to x++? • When translating from an atomic block to locks/unlocks we need to identify all program locations that contened on the shared resource • Check that the lock was obtained – this is not good – lock() unlock() • Check that the lock was released along all error paths • What happens if a signal is taken while in the critical section (pthreads) • What happens if an interrupt exception is taken while in wait()? try{ synchronized(o){ o.wait(); } }catch(Exception e){ } • When atomic conditional wait is implemented we typically introduce a race and we need to recheck the condition once in the critical section • Teaching pitfalls is highly effective in reducing the learning curve Contest
Hiding the protocol implementation • Prepare a general synchronization services for the system located in a separate class (see picture on the right) • Students - “OK but we’ll implement the protocol all over the place any way” • Hard to teach without real life large systems experience • Hard to suggest to engineers that maintain an existing system that is not like that • If its not broken don’t fix it… Contest
Testing • Running many times a test that has a concurrency problem does not necessarily produce it • Especially in unit test environments • Easy to demonstrate through examples • Create an “empty test” in which the synchronization primitives used are mapped to no-ops and shoe that the protocol “works fine” • Best practice – your test should at least expose a problem with the “empty implementation” • Running black box tests that have the required contention (e.g., customers accessing the ticketing system simultaneously) does not necessarily produce the white box contention you are after – • The blocking in <await turn[i] == next> to occur and not occur • A context switch to occur right before and right after <await turn[i] == next> • Defining the coverage tasks you are after and checking their “coverage” helps • E.g., ConTest synchronization coverage Contest
Exercises - knowing the synchronization primitives (Java) • 100 threads execute i++ where i is a global variable. Describe all possible outcomes • The following thread is interrupted while waiting at the blue statement below try{ synchronized(foo){ foo.wait(); } }catch(Exception e){}; Is the thread still holding the lock and is the thread interrupt bit turned on at the red statement above? What are the answers to the same questions if we change the program to: synchronized(foo){ try{ foo.wait(); }catch(Exception e){}; } Contest
Exercises - knowing the synchronization primitives (Java) • What happens if one thread executes the following method recursively, e.g., by excecuting factorial(7) synchronized int factorial(int i){ if(i == 0) return(1); else return(i * factorial(i-1)); } Contest
Will Parallel Programming Become Common Knowledge and the Parallel Programmer the Programmer of the future? • It is hard to teach parallel programming development and verification to novices • Comprehending the space of possible interleavings is hard • Accurately and correctly defining the behavior of many threads acting in parallel is hard • With the introduction of multi-core, there is an increasing need for programmers who are able to reliably develop parallel programs • But maybe a different solution is possible? • Can we avoid the need for the parallel programmer? • Can we have the compiler or the programming language encapsulate the difficulties of parallelism and return the genie to the bottle? • Will parallel programming become common knowledge and the parallel programmer the agent of the next revolution in programming paradigms? Contest
Will Parallel Programming Become Common Knowledge and the Parallel Programmer the Programmer of the future?(continued) • How will future multi-core systems be programmed? How well does existing primitives address various application domains and how well do they coexist? (3) • What is the role of high level primitivies (e.g., the trasaction model). Can it hide perforomance? (3) • Is the major difficulty in programming parallel programs testing them (2)? • How do we address students huge difficulties in predicting possible interleavings and, most special, the unwanted/undesired ones (2)? • What courses should be added to the curriculum and what should be taught on the job? (2) • What is the minimum knowledge one needs if the underlying program is parallel? To be more specific, most programmers probably know close to nothing about compiler optimization and about the processor structure. Will they need more knowledge in the future, or can the details be hidden from them? (1) • What will be the minimum knowledge needed by a parallel programmer and how will he or she acquire it, with emphasis on testing/debugging? (1) Contest