1 / 30

CSE 532 Fall 2013 Midterm Exam

CSE 532 Fall 2013 Midterm Exam. 80 minutes, during the usual lecture/studio time: 10:10am to 11:30am on Monday October 21, 2013 Arrive early if you can, exam will begin promptly at 10:10am Held in Bryan 305 (NOT URBAUER 218) You may want to locate the exam room in advance

freja
Download Presentation

CSE 532 Fall 2013 Midterm Exam

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 532 Fall 2013 Midterm Exam • 80 minutes, during the usual lecture/studio time: 10:10am to 11:30am on Monday October 21, 2013 • Arrive early if you can, exam will begin promptly at 10:10am • Held in Bryan 305 (NOT URBAUER 218) • You may want to locate the exam room in advance • Exam is open book, open notes, hard copy only • I will bring a copy each of the required and optional texts for people to come up to the front and take a look at as needed • Please feel free to print and bring in slides, your notes, etc. • ALL ELECTRONICS MUST BE OFF DURING THE EXEM (including phones, iPads, laptops, tablets, etc.)

  2. What is Generic Programming? • An abstraction technique for algorithms • Argument types are as general as possible • Separates algorithm steps & argument properties • What GoF design pattern does this resemble? • Type requirements can be specified, systematized • Can cluster requirements into abstractions • Termed “concepts” • Concepts can refine other concepts • Captures relationships between concepts • Analogy to inheritance hierarchies • A type that meets a concept’s requirements • Is a “model” of the concept • Can be plugged in to meet that set of requirements

  3. Construct a Thread to Launch It • The std::thread constructor takes any callable type • A function, function pointer, function object, or lambda • This includes things like member function pointers, etc. • Additional constructor arguments passed to callable instance • Watch out for argument passing semantics, though • Constructor arguments are copied locally without conversion • Need to wrap references in std::ref, force conversions, etc. • Default construction (without a thread) also possible • Can transfer ownership of it via C++11 move semantics, i.e., using std::move from one std::thread object to another • Be careful not to move thread ownership to a std::thread object that already owns one (terminates the program)

  4. Always Join or Detach a Launched Thread • Often you should join with each launched thread • E.g., to wait until a result it produces is ready to retrieve • E.g., to keep a resource it needs available for its lifetime • However, for truly independent threads, can detach • Relinquishes parent thread’s option to rendezvous with it • Need to copy all resources into the thread up front • Avoids circular wait deadlocks (if A joins B and B joins A) • Need to ensure join or detach for each thread • E.g., if an exception is thrown, still need to make it so • The guard (a.k.a. RAII) idiom helps with this, since guard’s destructor always joins or detaches if needed • The std::thread::joinable() method can be used to test that

  5. Design for Multithreaded Programming • Concurrency • Logical (single processor): instruction interleaving • Physical (multi-processor): parallel execution • Safety • Threads must not corrupt objects or resources • More generally, bad inter-leavings must be avoided • Atomic: runs to completion without being preempted • Granularity at which operations are atomic matters • Liveness • Progress must be made (deadlock is avoided) • Goal: full utilization (something is always running)

  6. Multi-Threaded Design, Continued • Race conditions (threads racing for access) • Two or more threads access an object/resource • The interleaving of their statements matters • Some inter-leavings have bad consequences • Example (critical sections) • Object has two variables x Є {A,C}, y Є {B,D} • Allowed states of the object are AB or CD • Assume each write is atomic, but writing both is not • Thread t writes x = A; and is then preempted • Thread u writes x = C; y = D; and blocks • Thread t writes y = B; • Object is left in an inconsistent state, CB

  7. Multi-Threaded Programming, Continued • Deadlock • One or more threads access an object/resource • Access to the resource is serialized • Chain of accesses leads to mutual blocking • Single-threaded example (“self-deadlock”) • A thread acquires then tries to reacquire same lock • If lock is not recursive thread blocks itself • Two thread example (“deadly embrace”) • Thread t acquires lock j, thread u acquires lock k • Thread t tries to acquire lock k, blocks • Thread u tries to acquire lock j, blocks

  8. Atomic Types • Many atomic types in C++11, at least some lock-free • Always lock-free: std::atomic_flag • If it matters, must test others with is_lock_free() • Also can specialize std::atomic<> class template • This is already done for many standard non-atomic type • Can also do this for your own types that implement a trivial copy-assignment operator, are bitwise equality comparable • Watch out for semantic details • E.g., bitwise evaluation of float, double, etc. representations • Equivalence may differ under atomic operations

  9. Reasoning about Concurrency • Operations on atomic types semantically well defined • Synchronizes-with relationship ensures that (unambiguously) operation X happens before or after operation Y • Can leverage this so eventually Xi happens-before Yj • Transitivity then lets you build various happens-before cases, including inter-thread happens-before relationships • Other variations on this theme are also useful • Dependeny-ordered-before and carries-a-dependency-to are used to reason about cases involving data dependencies • I.e., the result of one atomic operation is used in another

  10. Memory Models and Design • Trading off stricter ordering vs. higher overhead • Sequential consistency is easiest to think about (implement) because it imposes a consistent global total order on threads • Acquire-release consistency relaxes that to a pair-wise partial order that admits more concurrency • Relaxed ordering is least expensive, but should be applied selectively where available to optimize performance • Even these basic constucts allow sophisticated design • Memory fences to synchronize otherwise relaxed segments • Release chains offer a similar idea for data dependencies • Mixing atomic operations in with non-atomic ones to reduce overhead, but still enforce correct semantics (Chapter 7)

  11. Lock-Free and Wait-Free Semantics • Lock-free behavior never blocks (but may live-lock) • Suspension of one thread doesn’t impede others’ progress • Tries to do something, if cannot just tries again • E.g., while(head.compare_exchange_weak(n->next,n)); • Wait-free behavior never starves a thread • Progress of each is guaranteed (bounded number of retries) • Lock-free data structures try for maximum concurrency • E.g., ensuring some thread makes progress at every step • May not be strictly wait-free but that’s something to aim for • Watch out for performance costs in practice • E.g., atomic operations are slower, may not be worth it • Some platforms may not relax memory consistency well

  12. Lock-Free Stack Case Study • Even simplest version of push requires careful design • Allocate/initialize, then swap pointers via atomic operations • Need to deal with memory reclamation • Three strategies: thread counts, hazard pointers, ref counts • Memory models offer potential performance gains • E.g., if relaxed or acquire/release consistency is in fact weaker on the particular platform for which you’re developing • Need to profile that, e.g., as we did in the previous studio • Resulting lock free stack design is a good approach • E.g., Listing 7.12 in [Williams] • Please feel free to use (with appropriate citation comments) in your labs (we’ll code this up in the studio exercises)

  13. Lock-Free Queue Case Study • Contention differs in lock-free queue vs. stack • Enqueue/dequeue contention depends on how many nodes are in the queue, whereas push/pop contend unless empty • Synchronization needs (and thus design) are different • Application use cases also come into play • E.g., single-producer single-consumer queue is much simpler and may be all that is needed in some cases • Service configuration, template meta-programming, other approaches can enforce necessary properties of its use • Multi-thread-safe enqueue and dequeue operations • Modifications (to e.g., reference-counting) may be needed • May need to use work-stealing to be lock free (!)

  14. Lock Free Design Guidelines • Prototype data structures using sequential consistency • Then analyze and test thread-safety thoroughly • Then look for meaningful opportunities to relax consistency • Use a lock-free memory reclamation scheme • Count threads and then delete when quiescent • Use hazard pointers to track threads accesses to an object • Reference count and delete in a thread-safe way • Detach garbage and delegate deletion to another thread • Watch out for the ABA problem • E.g., with coupled variables, pop-push-pop issues • Identify busy waiting, then steal or delegate work • E.g., if thread would be blocked, “help it over the fence”

  15. Dividing Work Between Threads • Static partitioning of data can be helpful • Makes threads (mostly) independent, ahead of time • Threads can read from and write to their own locations • Some partitioning of data is necessarily dynamic • E.g., Quicksort uses a pivot at run-time to split up data • May need to launch (or pass data to) a thread at run-time • Can also partition work by task-type • E.g., hand off specific kinds of work to specialized threads • E.g., a thread-per-stage pipeline that is efficient once primed • Number of threads to use is a key design challenge • E.g., std::thread::hardware_concurrency() is only a starting point (blocking, scheduling, etc. also matter)

  16. Factors Affecting Performance • Need at least as many threads as hardware cores • Too few threads makes insufficient use of the resource • Oversubscription increases overhead due to task switching • Need to gauge for how long (and when) threads are active • Data contention and cache ping-pong • Performance degrades rapidly as cache misses increas • Need to design for low contention for cache lines • Need to avoid false sharing of elements (in same cache line) • Packing or spreading out data may be needed • E.g., localize each thread’s accesses • E.g., separate a shared mutex from the data that it guards

  17. Additional Considerations • Exception safety • Affects both lock based and lock-free synchronization • Use std::packaged_taskand std::future to allow for an exception being thrown in a thread (see listing 8.3) • Scalability • How much of the code is actually parallizable? • Various theoretical formulas (including Amdahl’s) apply • Hiding latency • If nothing ever blocks you may not need concurrency • If something does, concurrency makes parallel progress • Improving responsiveness • Giving each thread its own task may simplify, speed up tasks

  18. Thread Pools • Simplest version • A thread per core, all run a common worker thread function • Waiting for tasks to complete • Promises and futures give rendezvous with work completion • Could also post work results on an active object’s queue, which also may help avoid cache ping-pong • Futures also help with exception safety, e.g., a thrown exception propagates to thread that calls get on the future • Granularity of work is another key design decision • Too small and the overhead of managing the work adds up • To coarse and responsiveness, concurrency, may suffer • Work stealing lets idle threads relieve busy ones • May need to hand off promise as well as work, etc.

  19. Interrupting Threads (Part I) • Thread with interruption point is (cleanly) interruptible • Another thread can set a flag that it will notice and then exit • Clever use of lambdas, promises, move semantics lets a thread-local interrupt flag be managed (see listing 9.9) • Need to be careful to avoid dangling pointers on thread exit • For simple cases, detecting interruption may be trivial • E.g., event loop with interruption point checked each time • For condition variables interruption is more complex • E.g., using the guard idiom to avoid exception hazards • E.g., waiting with a timeout (and handling spurious wakes) • Can eliminate spurious wakes with a scheme based on a custom lock and a condition_variable_any (listing 9.12)

  20. Interrupting Threads (Part II) • Unlike condition variable waits, thread interruption with other blocking calls goes back to timed waiting • No access to internals of locking and unlocking semantics • Best you can do is unblock and check frequently (with interval chosen to balance overhead and responsiveness) • Handling interruptions • Can use standard exception handling in interrupted thread • Can use promises and futures between threads to propagate • Put a catch block in the wrapper that initializes the interrupt flag, so uncaught exception doesn’t end the entire program • Can combine interruption and joining with threads • E.g., to stop background threads and wait for them to end

  21. Concurrency Related Bugs • Deadlock occurs when a thread never unblocks • Complete deadlock occurs when no thread ever unblocks • Blocking I/O can be problematic (e.g., if input never arrives) • Livelock is similar but involves futile effort • Threads are not blocked, but never make real progress • E.g., if a condition never occurs, or with protocol bugs • Data races and broken invariants • Can corrupt data, dangle pointers, double free, leak data • Lifetime of thread relative to its data also matters • If thread exits without freeing resources they can leak • If resources are freed before thread is done with them (or even gains access to them) behavior may be undefined

  22. Locating Concurrency Related Bugs • Inspection can be useful but easily misses subtle bugs • Any possible sequence of relevant actions may matter • Explanation/modeling can be even more powerful • Speculate about how different sequences can manifest • Even (or especially) unlikely ones: what if another thread…? • Gain experience with different races, deadlocks • Try those on for size with respect to code you’re testing • E.g., ABA issues, circular waits, etc. • Hypothesize, predict, instrument & observe, repeat • The scientific method is the most powerful debugging tool • Develop concurrency related regression test suites • Invest in testing harnesses that drive event sequeneces, etc. (e.g., boost statecharts may let you automate some of this)

  23. Design for Testability • Consider doing formal modeling of concurrency • E.g., for model checking of temporal or timed temporal logic • Good tools exist to help with this (SPIN, UPPAAL, IF, etc.) • At least consider what tests you’ll run as part of design • Can help you avoid concurrency design mistakes initially • Can help you maintain regression tests as code evolves: e.g. how likely will you spot a newly introduced race in old code? • Design for pluggable concurrency • Single threaded vs. logically vs. physically concurrent • A pluggable scheduler and modular units of work can help • Taken to its extreme do combination simulation testing • Combining all of the above potentially lets you explore, test, and then reproduce different concurrency scenarios reliably

  24. What is a Pattern Language? • A narrative that composes patterns • Not just a catalog or listing of the patterns • Reconciles design forces between patterns • Provides an outline for design steps • A generator for a complete design • Patterns may leave consequences • Other patterns can resolve them • Generative designs resolve all forces • Internal tensions don’t “pull design apart”

  25. Categories of Patterns (for CSE 532) • Service Access and Configuration • Appropriate programming interfaces/abstractions • Event Handling • Inescapable in networked systems • Concurrency • Exploiting physical and logical parallelism • Synchronization • Managing safety and liveness in concurrent systems

  26. pthread_create (thread, attr, start_routine, arg); pthread)_exit (status); pthread_cancel (thread); … Wrapper Facade thread thread (); thread (function, args); ~thread(); join(); … Combines related functions/data (OO, generic) Used to adapt existing procedural APIs Offers better interfaces Concise, maintainable, portable, cohesive, type safe

  27. Asynchronous Completion Token Pattern • A service (eventually) passes a “cookie” to client • Examples with C++11 futures and promises • A future (eventually) holds ACT (or an exception) from which initiator can obtain the result • Client thread can block on a call to get the data or can repeatedly poll (with timeouts if you’d like) for it • A future can be packaged up with an asynchronously running service in several ways • Directly: e.g., returned by std::async • Bundled: e.g., via a std::packaged_task • As a communication channel: e.g., via std::promise • A promise can be kept or broken • If broken, an exception is thrown to client

  28. Synchronization Patterns • Key issues • Avoiding meaningful race conditions and deadlock • Scoped Locking (via the C++ RAII Idiom) • Ensures a lock is acquired/released in a scope • Thread-Safe Interface • Reduce internal locking overhead • Avoid self-deadlock • Strategized Locking • Customize locks for safety, liveness, optimization

  29. Concurrency Patterns • Key issue: sharing resources across threads • Thread Specific Storage Pattern • Separates resource access to avoid contention among them • Monitor Object Pattern • One thread at a time can access the object’s resources • Active Object Pattern • One worker thread owns the object‘s resources • Half-Sync/Half-Async (HSHA) Pattern • A thread collects asynchronous requests and works on the requests synchronously (similar to Active Object) • Leader/Followers Pattern • Optimize HSHA for independent messages/threads

  30. CSE 532 Fall 2013 Midterm Exam • 80 minutes, during the usual lecture/studio time: 10:10am to 11:30am on Monday October 21, 2013 • Arrive early if you can, exam will begin promptly at 10:10am • Held in Bryan 305 (NOT URBAUER 218) • You may want to locate the exam room in advance • Exam is open book, open notes, hard copy only • I will bring a copy each of the required and optional texts for people to come up to the front and take a look at as needed • Please feel free to print and bring in slides, your notes, etc. • ALL ELECTRONICS MUST BE OFF DURING THE EXEM (including phones, iPads, laptops, tablets, etc.)

More Related