320 likes | 526 Views
CSE 532 Fall 2013 Take Home Final Exam. Posted on the course web site in Word and Acrobat Complete electronically then email to cdgill@cse.wustl.edu Alternatively, you can print, complete, and submit hard copy Please complete the exam without consulting others
E N D
CSE 532 Fall 2013 Take Home Final Exam • Posted on the course web site in Word and Acrobat • Complete electronically then email to cdgill@cse.wustl.edu • Alternatively, you can print, complete, and submit hard copy • Please complete the exam without consulting others • Except Prof. Gill, to whom you can individually e-mail any clarifying questions, etc. that you may have as you go • You are free to use compilers, web site slides, etc. • Copies of the required and optional text books will be on reserve on the CSE Department office (Bryan 509C) • You may search for and read other sources of information on the internet, but you may not post questions for others there
Construct a Thread to Launch It • The std::thread constructor takes any callable type • A function, function pointer, function object, or lambda • This includes things like member function pointers, etc. • Additional constructor arguments passed to callable instance • Watch out for argument passing semantics, though • Constructor arguments are copied locally without conversion • Need to wrap references in std::ref, force conversions, etc. • Default construction (without a thread) also possible • Can transfer ownership of it via C++11 move semantics, i.e., using std::move from one std::thread object to another • Be careful not to move thread ownership to a std::thread object that already owns one (terminates the program)
Always Join or Detach a Launched Thread • Often you should join with each launched thread • E.g., to wait until a result it produces is ready to retrieve • E.g., to keep a resource it needs available for its lifetime • However, for truly independent threads, can detach • Relinquishes parent thread’s option to rendezvous with it • Need to copy all resources into the thread up front • Avoids circular wait deadlocks (if A joins B and B joins A) • Need to ensure join or detach for each thread • E.g., if an exception is thrown, still need to make it so • The guard (a.k.a. RAII) idiom helps with this, since guard’s destructor always joins or detaches if needed • The std::thread::joinable() method can be used to test that
Design for Multithreaded Programming • Concurrency • Logical (single processor): instruction interleaving • Physical (multi-processor): parallel execution • Safety • Threads must not corrupt objects or resources • More generally, bad inter-leavings must be avoided • Atomic: runs to completion without being preempted • Granularity at which operations are atomic matters • Liveness • Progress must be made (deadlock is avoided) • Goal: full utilization (something is always running)
Atomic Types • Many atomic types in C++11, at least some lock-free • Always lock-free: std::atomic_flag • If it matters, must test others with is_lock_free() • Also can specialize std::atomic<> class template • This is already done for many standard non-atomic type • Can also do this for your own types that implement a trivial copy-assignment operator, are bitwise equality comparable • Watch out for semantic details • E.g., bitwise evaluation of float, double, etc. representations • Equivalence may differ under atomic operations
Lock-Free and Wait-Free Semantics • Lock-free behavior never blocks (but may live-lock) • Suspension of one thread doesn’t impede others’ progress • Tries to do something, if cannot just tries again • E.g., while(head.compare_exchange_weak(n->next,n)); • Wait-free behavior never starves a thread • Progress of each is guaranteed (bounded number of retries) • Lock-free data structures try for maximum concurrency • E.g., ensuring some thread makes progress at every step • May not be strictly wait-free but that’s something to aim for • Watch out for performance costs in practice • E.g., atomic operations are slower, may not be worth it • Some platforms may not relax memory consistency well
Lock Free Design Guidelines • Prototype data structures using sequential consistency • Then analyze and test thread-safety thoroughly • Then look for meaningful opportunities to relax consistency • Use a lock-free memory reclamation scheme • Count threads and then delete when quiescent • Use hazard pointers to track threads accesses to an object • Reference count and delete in a thread-safe way • Detach garbage and delegate deletion to another thread • Watch out for the ABA problem • E.g., with coupled variables, pop-push-pop issues • Identify busy waiting, then steal or delegate work • E.g., if thread would be blocked, “help it over the fence”
Dividing Work Between Threads • Static partitioning of data can be helpful • Makes threads (mostly) independent, ahead of time • Threads can read from and write to their own locations • Some partitioning of data is necessarily dynamic • E.g., Quicksort uses a pivot at run-time to split up data • May need to launch (or pass data to) a thread at run-time • Can also partition work by task-type • E.g., hand off specific kinds of work to specialized threads • E.g., a thread-per-stage pipeline that is efficient once primed • Number of threads to use is a key design challenge • E.g., std::thread::hardware_concurrency() is only a starting point (blocking, scheduling, etc. also matter)
Factors Affecting Performance • Need at least as many threads as hardware cores • Too few threads makes insufficient use of the resource • Oversubscription increases overhead due to task switching • Need to gauge for how long (and when) threads are active • Data contention and cache ping-pong • Performance degrades rapidly as cache misses increas • Need to design for low contention for cache lines • Need to avoid false sharing of elements (in same cache line) • Packing or spreading out data may be needed • E.g., localize each thread’s accesses • E.g., separate a shared mutex from the data that it guards
Additional Considerations • Exception safety • Affects both lock based and lock-free synchronization • Use std::packaged_taskand std::future to allow for an exception being thrown in a thread (see listing 8.3) • Scalability • How much of the code is actually parallizable? • Various theoretical formulas (including Amdahl’s) apply • Hiding latency • If nothing ever blocks you may not need concurrency • If something does, concurrency makes parallel progress • Improving responsiveness • Giving each thread its own task may simplify, speed up tasks
Thread Pools • Simplest version • A thread per core, all run a common worker thread function • Waiting for tasks to complete • Promises and futures give rendezvous with work completion • Could also post work results on an active object’s queue, which also may help avoid cache ping-pong • Futures also help with exception safety, e.g., a thrown exception propagates to thread that calls get on the future • Granularity of work is another key design decision • Too small and the overhead of managing the work adds up • To coarse and responsiveness, concurrency, may suffer • Work stealing lets idle threads relieve busy ones • May need to hand off promise as well as work, etc.
Interrupting Threads (Part I) • Thread with interruption point is (cleanly) interruptible • Another thread can set a flag that it will notice and then exit • Clever use of lambdas, promises, move semantics lets a thread-local interrupt flag be managed (see listing 9.9) • Need to be careful to avoid dangling pointers on thread exit • For simple cases, detecting interruption may be trivial • E.g., event loop with interruption point checked each time • For condition variables interruption is more complex • E.g., using the guard idiom to avoid exception hazards • E.g., waiting with a timeout (and handling spurious wakes) • Can eliminate spurious wakes with a scheme based on a custom lock and a condition_variable_any (listing 9.12)
Concurrency Related Bugs • Deadlock occurs when a thread never unblocks • Complete deadlock occurs when no thread ever unblocks • Blocking I/O can be problematic (e.g., if input never arrives) • Livelock is similar but involves futile effort • Threads are not blocked, but never make real progress • E.g., if a condition never occurs, or with protocol bugs • Data races and broken invariants • Can corrupt data, dangle pointers, double free, leak data • Lifetime of thread relative to its data also matters • If thread exits without freeing resources they can leak • If resources are freed before thread is done with them (or even gains access to them) behavior may be undefined
What is a Pattern Language? • A narrative that composes patterns • Not just a catalog or listing of the patterns • Reconciles design forces between patterns • Provides an outline for design steps • A generator for a complete design • Patterns may leave consequences • Other patterns can resolve them • Generative designs resolve all forces • Internal tensions don’t “pull design apart”
Categories of Patterns (for CSE 532) • Service Access and Configuration • Appropriate programming interfaces/abstractions • Event Handling • Inescapable in networked systems • Concurrency • Exploiting physical and logical parallelism • Synchronization • Managing safety and liveness in concurrent systems
pthread_create (thread, attr, start_routine, arg); pthread)_exit (status); pthread_cancel (thread); … Wrapper Facade thread thread (); thread (function, args); ~thread(); join(); … Combines related functions/data (OO, generic) Used to adapt existing procedural APIs Offers better interfaces Concise, maintainable, portable, cohesive, type safe
Asynchronous Completion Token Pattern • A service (eventually) passes a “cookie” to client • Examples with C++11 futures and promises • A future (eventually) holds ACT (or an exception) from which initiator can obtain the result • Client thread can block on a call to get the data or can repeatedly poll (with timeouts if you’d like) for it • A future can be packaged up with an asynchronously running service in several ways • Directly: e.g., returned by std::async • Bundled: e.g., via a std::packaged_task • As a communication channel: e.g., via std::promise • A promise can be kept or broken • If broken, an exception is thrown to client
Synchronization Patterns • Key issues • Avoiding meaningful race conditions and deadlock • Scoped Locking (via the C++ RAII Idiom) • Ensures a lock is acquired/released in a scope • Thread-Safe Interface • Reduce internal locking overhead • Avoid self-deadlock • Strategized Locking • Customize locks for safety, liveness, optimization
Concurrency Patterns • Key issue: sharing resources across threads • Thread Specific Storage Pattern • Separates resource access to avoid contention among them • Monitor Object Pattern • One thread at a time can access the object’s resources • Active Object Pattern • One worker thread owns the object‘s resources • Half-Sync/Half-Async (HSHA) Pattern • A thread collects asynchronous requests and works on the requests synchronously (similar to Active Object) • Leader/Followers Pattern • Optimize HSHA for independent messages/threads
Event Driven Server • Inversion of control • Hollywood principle – Don’t call us, we’ll call you (“there is no main”) (reusable: e.g., from ACE) Event Dispatching Logic (pluggable: you write for your application) Event Handling Logic Connection Acceptor handle_connection_request Event Handlers handle_data_read Data Reader
Client and Server Roles • Each process plays a single role • E.g., Logging Server example • Logging Server gets info from several clients • Roles affect connection establishment as well as use • Clients actively initiate connection requests • Server passively accepts connection requests • Client/server roles may overlap • Allow flexibility to act as either one • Or, to act as both at once (e.g., in a publish/subscribe gateway) Listening port Server Client Server/Client
Reactor Pattern Solution Approach Application De-multiplexing & Dispatching Application logic Event sources Event Handlers Reactor Synchronous wait Serial Dispatching
Acceptor/Connector Solution Approach Acceptor Event sources Dispatcher Handler Connector Connection establishment Service instantiation & initialization Event de-multiplexing & dispatching Service handling (connection use)
Proactor in a nutshell 2 1 0 register handlers handle events asynch_io ACT1 ACT2 1 2 4 handle_event 8 complete 1 2 ACT 3 5 7 associate handles with I/O completion port wait 1 2 ACT 6 completion event create handlers Completion Handler2 Completion Handler1 Application Proactor I/O Completion port OS (or AIO emulation)
Compare Reactor vs. Proactor Side by Side Reactor Proactor Application Application ASYNCH accept/read/write handle_events Reactor Handle handle_event handle_events Event Handler Proactor accept/read/write handle_event Handle Completion Handler
Motivation for Interceptor Pattern • Fundamental concern • How to integrate out-of-band tasks with in-band tasks? • Straightforward (and all to common) approach • Paste the out-of-band logic wherever it’s needed • May be multiple places to paste it • Brittle, tedious, error-prone, time/space (e.g., inlining) • Is there a better and more general approach? Out-of-Band (Admin) In-Band (Client) In-Band (Server)
Interceptor Solution Approach • Our goal is to find a general mechanism to integrate out-of-band tasks with in-band tasks • In-band tasks • Processed as usual via framework • Out-of-band tasks • register with framework via special interfaces • are triggered by framework on certain events • Some events are generated by in-band processing • are provided access to framework internals (i.e., context) via specific interfaces
Component Configurator Pattern • Motivation • 7x24 server upgrades • “always on, always connected” • Web server load balancing • Work flows (re-)distributed to available endsystems • Mobile agents • Service reconfiguration on agent arrival/departure • Solution Approach • Decouple implementations over time • Allow different behaviors of services at run-time • Offer points where implementations are re-configured • Allow configuration decisions to be deferred until service initiation or while service is running (suspend/resume)
Service Lifecycle • Compare picture to • Thread states • Process states • E.g., Silberschatz & Galvin 4th ed, Fig. 4.1 • Can “park” a service • Users wait for a bit • Or, upgrade a copy • Background reconfig • Hot swap when ready
Mixed Duration Request Handlers (MDRH) Vertical Design of an Architecture follower threads • Reactor + HS/HA or LF • Designed to handle streams of mixed duration requests • Focused on interactions among local mechanisms • Concurrency and synchronization concerns • Hand-offs among threads • Well suited for “hub and spokes” or “processing pipeline” style applications • However, in some applications, a distributed view is more appropriate hand off chains enqueue requests leader thread reactor thread
Horizontal Design of an Architecture • Application components are implemented as handlers • Use reactor threads to run input and output methods • Send requests to other handlers via sockets, upcalls • These in turn define key interception points end-to-end handler h1 handler h2 handler h4 handler h3 socket socket reactor r3 reactor r1 reactor r2
CSE 532 Fall 2013 Take Home Final Exam • Posted on the course web site in Word and Acrobat • Complete electronically then email to cdgill@cse.wustl.edu • Alternatively, you can print, complete, and submit hard copy • Please complete the exam without consulting others • Except Prof. Gill, to whom you can individually e-mail any clarifying questions, etc. that you may have as you go • You are free to use compilers, web site slides, etc. • Copies of the required and optional text books will be on reserve on the CSE Department office (Bryan 509C) • You may search for and read other sources of information on the internet, but you may not post questions for others there