300 likes | 474 Views
CSE 532 Fall 2013 Midterm Exam. 80 minutes, during the usual lecture/studio time: 10:10am to 11:30am on Monday October 21, 2013 Arrive early if you can, exam will begin promptly at 10:10am Held in Bryan 305 (NOT URBAUER 218) You may want to locate the exam room in advance
E N D
CSE 532 Fall 2013 Midterm Exam • 80 minutes, during the usual lecture/studio time: 10:10am to 11:30am on Monday October 21, 2013 • Arrive early if you can, exam will begin promptly at 10:10am • Held in Bryan 305 (NOT URBAUER 218) • You may want to locate the exam room in advance • Exam is open book, open notes, hard copy only • I will bring a copy each of the required and optional texts for people to come up to the front and take a look at as needed • Please feel free to print and bring in slides, your notes, etc. • ALL ELECTRONICS MUST BE OFF DURING THE EXEM (including phones, iPads, laptops, tablets, etc.)
What is Generic Programming? • An abstraction technique for algorithms • Argument types are as general as possible • Separates algorithm steps & argument properties • What GoF design pattern does this resemble? • Type requirements can be specified, systematized • Can cluster requirements into abstractions • Termed “concepts” • Concepts can refine other concepts • Captures relationships between concepts • Analogy to inheritance hierarchies • A type that meets a concept’s requirements • Is a “model” of the concept • Can be plugged in to meet that set of requirements
Construct a Thread to Launch It • The std::thread constructor takes any callable type • A function, function pointer, function object, or lambda • This includes things like member function pointers, etc. • Additional constructor arguments passed to callable instance • Watch out for argument passing semantics, though • Constructor arguments are copied locally without conversion • Need to wrap references in std::ref, force conversions, etc. • Default construction (without a thread) also possible • Can transfer ownership of it via C++11 move semantics, i.e., using std::move from one std::thread object to another • Be careful not to move thread ownership to a std::thread object that already owns one (terminates the program)
Always Join or Detach a Launched Thread • Often you should join with each launched thread • E.g., to wait until a result it produces is ready to retrieve • E.g., to keep a resource it needs available for its lifetime • However, for truly independent threads, can detach • Relinquishes parent thread’s option to rendezvous with it • Need to copy all resources into the thread up front • Avoids circular wait deadlocks (if A joins B and B joins A) • Need to ensure join or detach for each thread • E.g., if an exception is thrown, still need to make it so • The guard (a.k.a. RAII) idiom helps with this, since guard’s destructor always joins or detaches if needed • The std::thread::joinable() method can be used to test that
Design for Multithreaded Programming • Concurrency • Logical (single processor): instruction interleaving • Physical (multi-processor): parallel execution • Safety • Threads must not corrupt objects or resources • More generally, bad inter-leavings must be avoided • Atomic: runs to completion without being preempted • Granularity at which operations are atomic matters • Liveness • Progress must be made (deadlock is avoided) • Goal: full utilization (something is always running)
Multi-Threaded Design, Continued • Race conditions (threads racing for access) • Two or more threads access an object/resource • The interleaving of their statements matters • Some inter-leavings have bad consequences • Example (critical sections) • Object has two variables x Є {A,C}, y Є {B,D} • Allowed states of the object are AB or CD • Assume each write is atomic, but writing both is not • Thread t writes x = A; and is then preempted • Thread u writes x = C; y = D; and blocks • Thread t writes y = B; • Object is left in an inconsistent state, CB
Multi-Threaded Programming, Continued • Deadlock • One or more threads access an object/resource • Access to the resource is serialized • Chain of accesses leads to mutual blocking • Single-threaded example (“self-deadlock”) • A thread acquires then tries to reacquire same lock • If lock is not recursive thread blocks itself • Two thread example (“deadly embrace”) • Thread t acquires lock j, thread u acquires lock k • Thread t tries to acquire lock k, blocks • Thread u tries to acquire lock j, blocks
Atomic Types • Many atomic types in C++11, at least some lock-free • Always lock-free: std::atomic_flag • If it matters, must test others with is_lock_free() • Also can specialize std::atomic<> class template • This is already done for many standard non-atomic type • Can also do this for your own types that implement a trivial copy-assignment operator, are bitwise equality comparable • Watch out for semantic details • E.g., bitwise evaluation of float, double, etc. representations • Equivalence may differ under atomic operations
Reasoning about Concurrency • Operations on atomic types semantically well defined • Synchronizes-with relationship ensures that (unambiguously) operation X happens before or after operation Y • Can leverage this so eventually Xi happens-before Yj • Transitivity then lets you build various happens-before cases, including inter-thread happens-before relationships • Other variations on this theme are also useful • Dependeny-ordered-before and carries-a-dependency-to are used to reason about cases involving data dependencies • I.e., the result of one atomic operation is used in another
Memory Models and Design • Trading off stricter ordering vs. higher overhead • Sequential consistency is easiest to think about (implement) because it imposes a consistent global total order on threads • Acquire-release consistency relaxes that to a pair-wise partial order that admits more concurrency • Relaxed ordering is least expensive, but should be applied selectively where available to optimize performance • Even these basic constucts allow sophisticated design • Memory fences to synchronize otherwise relaxed segments • Release chains offer a similar idea for data dependencies • Mixing atomic operations in with non-atomic ones to reduce overhead, but still enforce correct semantics (Chapter 7)
Lock-Free and Wait-Free Semantics • Lock-free behavior never blocks (but may live-lock) • Suspension of one thread doesn’t impede others’ progress • Tries to do something, if cannot just tries again • E.g., while(head.compare_exchange_weak(n->next,n)); • Wait-free behavior never starves a thread • Progress of each is guaranteed (bounded number of retries) • Lock-free data structures try for maximum concurrency • E.g., ensuring some thread makes progress at every step • May not be strictly wait-free but that’s something to aim for • Watch out for performance costs in practice • E.g., atomic operations are slower, may not be worth it • Some platforms may not relax memory consistency well
Lock-Free Stack Case Study • Even simplest version of push requires careful design • Allocate/initialize, then swap pointers via atomic operations • Need to deal with memory reclamation • Three strategies: thread counts, hazard pointers, ref counts • Memory models offer potential performance gains • E.g., if relaxed or acquire/release consistency is in fact weaker on the particular platform for which you’re developing • Need to profile that, e.g., as we did in the previous studio • Resulting lock free stack design is a good approach • E.g., Listing 7.12 in [Williams] • Please feel free to use (with appropriate citation comments) in your labs (we’ll code this up in the studio exercises)
Lock-Free Queue Case Study • Contention differs in lock-free queue vs. stack • Enqueue/dequeue contention depends on how many nodes are in the queue, whereas push/pop contend unless empty • Synchronization needs (and thus design) are different • Application use cases also come into play • E.g., single-producer single-consumer queue is much simpler and may be all that is needed in some cases • Service configuration, template meta-programming, other approaches can enforce necessary properties of its use • Multi-thread-safe enqueue and dequeue operations • Modifications (to e.g., reference-counting) may be needed • May need to use work-stealing to be lock free (!)
Lock Free Design Guidelines • Prototype data structures using sequential consistency • Then analyze and test thread-safety thoroughly • Then look for meaningful opportunities to relax consistency • Use a lock-free memory reclamation scheme • Count threads and then delete when quiescent • Use hazard pointers to track threads accesses to an object • Reference count and delete in a thread-safe way • Detach garbage and delegate deletion to another thread • Watch out for the ABA problem • E.g., with coupled variables, pop-push-pop issues • Identify busy waiting, then steal or delegate work • E.g., if thread would be blocked, “help it over the fence”
Dividing Work Between Threads • Static partitioning of data can be helpful • Makes threads (mostly) independent, ahead of time • Threads can read from and write to their own locations • Some partitioning of data is necessarily dynamic • E.g., Quicksort uses a pivot at run-time to split up data • May need to launch (or pass data to) a thread at run-time • Can also partition work by task-type • E.g., hand off specific kinds of work to specialized threads • E.g., a thread-per-stage pipeline that is efficient once primed • Number of threads to use is a key design challenge • E.g., std::thread::hardware_concurrency() is only a starting point (blocking, scheduling, etc. also matter)
Factors Affecting Performance • Need at least as many threads as hardware cores • Too few threads makes insufficient use of the resource • Oversubscription increases overhead due to task switching • Need to gauge for how long (and when) threads are active • Data contention and cache ping-pong • Performance degrades rapidly as cache misses increas • Need to design for low contention for cache lines • Need to avoid false sharing of elements (in same cache line) • Packing or spreading out data may be needed • E.g., localize each thread’s accesses • E.g., separate a shared mutex from the data that it guards
Additional Considerations • Exception safety • Affects both lock based and lock-free synchronization • Use std::packaged_taskand std::future to allow for an exception being thrown in a thread (see listing 8.3) • Scalability • How much of the code is actually parallizable? • Various theoretical formulas (including Amdahl’s) apply • Hiding latency • If nothing ever blocks you may not need concurrency • If something does, concurrency makes parallel progress • Improving responsiveness • Giving each thread its own task may simplify, speed up tasks
Thread Pools • Simplest version • A thread per core, all run a common worker thread function • Waiting for tasks to complete • Promises and futures give rendezvous with work completion • Could also post work results on an active object’s queue, which also may help avoid cache ping-pong • Futures also help with exception safety, e.g., a thrown exception propagates to thread that calls get on the future • Granularity of work is another key design decision • Too small and the overhead of managing the work adds up • To coarse and responsiveness, concurrency, may suffer • Work stealing lets idle threads relieve busy ones • May need to hand off promise as well as work, etc.
Interrupting Threads (Part I) • Thread with interruption point is (cleanly) interruptible • Another thread can set a flag that it will notice and then exit • Clever use of lambdas, promises, move semantics lets a thread-local interrupt flag be managed (see listing 9.9) • Need to be careful to avoid dangling pointers on thread exit • For simple cases, detecting interruption may be trivial • E.g., event loop with interruption point checked each time • For condition variables interruption is more complex • E.g., using the guard idiom to avoid exception hazards • E.g., waiting with a timeout (and handling spurious wakes) • Can eliminate spurious wakes with a scheme based on a custom lock and a condition_variable_any (listing 9.12)
Interrupting Threads (Part II) • Unlike condition variable waits, thread interruption with other blocking calls goes back to timed waiting • No access to internals of locking and unlocking semantics • Best you can do is unblock and check frequently (with interval chosen to balance overhead and responsiveness) • Handling interruptions • Can use standard exception handling in interrupted thread • Can use promises and futures between threads to propagate • Put a catch block in the wrapper that initializes the interrupt flag, so uncaught exception doesn’t end the entire program • Can combine interruption and joining with threads • E.g., to stop background threads and wait for them to end
Concurrency Related Bugs • Deadlock occurs when a thread never unblocks • Complete deadlock occurs when no thread ever unblocks • Blocking I/O can be problematic (e.g., if input never arrives) • Livelock is similar but involves futile effort • Threads are not blocked, but never make real progress • E.g., if a condition never occurs, or with protocol bugs • Data races and broken invariants • Can corrupt data, dangle pointers, double free, leak data • Lifetime of thread relative to its data also matters • If thread exits without freeing resources they can leak • If resources are freed before thread is done with them (or even gains access to them) behavior may be undefined
Locating Concurrency Related Bugs • Inspection can be useful but easily misses subtle bugs • Any possible sequence of relevant actions may matter • Explanation/modeling can be even more powerful • Speculate about how different sequences can manifest • Even (or especially) unlikely ones: what if another thread…? • Gain experience with different races, deadlocks • Try those on for size with respect to code you’re testing • E.g., ABA issues, circular waits, etc. • Hypothesize, predict, instrument & observe, repeat • The scientific method is the most powerful debugging tool • Develop concurrency related regression test suites • Invest in testing harnesses that drive event sequeneces, etc. (e.g., boost statecharts may let you automate some of this)
Design for Testability • Consider doing formal modeling of concurrency • E.g., for model checking of temporal or timed temporal logic • Good tools exist to help with this (SPIN, UPPAAL, IF, etc.) • At least consider what tests you’ll run as part of design • Can help you avoid concurrency design mistakes initially • Can help you maintain regression tests as code evolves: e.g. how likely will you spot a newly introduced race in old code? • Design for pluggable concurrency • Single threaded vs. logically vs. physically concurrent • A pluggable scheduler and modular units of work can help • Taken to its extreme do combination simulation testing • Combining all of the above potentially lets you explore, test, and then reproduce different concurrency scenarios reliably
What is a Pattern Language? • A narrative that composes patterns • Not just a catalog or listing of the patterns • Reconciles design forces between patterns • Provides an outline for design steps • A generator for a complete design • Patterns may leave consequences • Other patterns can resolve them • Generative designs resolve all forces • Internal tensions don’t “pull design apart”
Categories of Patterns (for CSE 532) • Service Access and Configuration • Appropriate programming interfaces/abstractions • Event Handling • Inescapable in networked systems • Concurrency • Exploiting physical and logical parallelism • Synchronization • Managing safety and liveness in concurrent systems
pthread_create (thread, attr, start_routine, arg); pthread)_exit (status); pthread_cancel (thread); … Wrapper Facade thread thread (); thread (function, args); ~thread(); join(); … Combines related functions/data (OO, generic) Used to adapt existing procedural APIs Offers better interfaces Concise, maintainable, portable, cohesive, type safe
Asynchronous Completion Token Pattern • A service (eventually) passes a “cookie” to client • Examples with C++11 futures and promises • A future (eventually) holds ACT (or an exception) from which initiator can obtain the result • Client thread can block on a call to get the data or can repeatedly poll (with timeouts if you’d like) for it • A future can be packaged up with an asynchronously running service in several ways • Directly: e.g., returned by std::async • Bundled: e.g., via a std::packaged_task • As a communication channel: e.g., via std::promise • A promise can be kept or broken • If broken, an exception is thrown to client
Synchronization Patterns • Key issues • Avoiding meaningful race conditions and deadlock • Scoped Locking (via the C++ RAII Idiom) • Ensures a lock is acquired/released in a scope • Thread-Safe Interface • Reduce internal locking overhead • Avoid self-deadlock • Strategized Locking • Customize locks for safety, liveness, optimization
Concurrency Patterns • Key issue: sharing resources across threads • Thread Specific Storage Pattern • Separates resource access to avoid contention among them • Monitor Object Pattern • One thread at a time can access the object’s resources • Active Object Pattern • One worker thread owns the object‘s resources • Half-Sync/Half-Async (HSHA) Pattern • A thread collects asynchronous requests and works on the requests synchronously (similar to Active Object) • Leader/Followers Pattern • Optimize HSHA for independent messages/threads
CSE 532 Fall 2013 Midterm Exam • 80 minutes, during the usual lecture/studio time: 10:10am to 11:30am on Monday October 21, 2013 • Arrive early if you can, exam will begin promptly at 10:10am • Held in Bryan 305 (NOT URBAUER 218) • You may want to locate the exam room in advance • Exam is open book, open notes, hard copy only • I will bring a copy each of the required and optional texts for people to come up to the front and take a look at as needed • Please feel free to print and bring in slides, your notes, etc. • ALL ELECTRONICS MUST BE OFF DURING THE EXEM (including phones, iPads, laptops, tablets, etc.)