1 / 41

Multiprocessors and Threads

Multiprocessors and Threads. Lecture 3. Motivation. Enhanced Performance - Concurrent execution of tasks for increased throughput (between processes) Exploit Concurrency in Tasks (Parallelism within process) Fault Tolerance - graceful degradation in face of failures.

jonny
Download Presentation

Multiprocessors and Threads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiprocessors and Threads Lecture 3 CS523S: Operating Systems

  2. Motivation • Enhanced Performance - • Concurrent execution of tasks for increased throughput (between processes) • Exploit Concurrency in Tasks (Parallelism within process) • Fault Tolerance - • graceful degradation in face of failures CS523S: Operating Systems

  3. Basic MP Architectures • Single Instruction Single Data (SISD) - conventional uniprocessor designs. • Single Instruction Multiple Data (SIMD) - Vector and Array Processors • Multiple Instruction Single Data (MISD) - Not Implemented. • Multiple Instruction Multiple Data (MIMD) - conventional MP designs CS523S: Operating Systems

  4. MIMD Classifications • Tightly Coupled System - all processors share the same global memory and have the same address spaces (Typical SMP system). • Main memory for IPC and Synchronization. • Loosely Coupled System - memory is partitioned and attached to each processor. Hypercube, Clusters (Multi-Computer). • Message passing for IPC and synchronization. CS523S: Operating Systems

  5. CPU CPU CPU CPU cache cache cache cache MMU MMU MMU MMU Interconnection Network MM MM MM MM MP Block Diagram CS523S: Operating Systems

  6. Memory Access Schemes • Uniform Memory Access (UMA) • Centrally located • All processors are equidistant (access times) • NonUniform Access (NUMA) • physically partitioned but accessible by all • processors have the same address space • NO Remote Memory Access (NORMA) • physically partitioned, not accessible by all • processors have own address space CS523S: Operating Systems

  7. Other Details of MP • Interconnection technology • Bus • Cross-Bar switch • Multistage Interconnect Network • Caching - Cache Coherence Problem! • Write-update • Write-invalidate • bus snooping CS523S: Operating Systems

  8. MP OS Structure - 1 • Separate Supervisor - • all processors have their own copy of the kernel. • Some share data for interaction • dedicated I/O devices and file systems • good fault tolerance • bad for concurrency CS523S: Operating Systems

  9. MP OS Structure - 2 • Master/Slave Configuration • master monitors the status and assigns work to other processors (slaves) • Slaves are a schedulable pool of resources for the master • master can be bottleneck • poor fault tolerance CS523S: Operating Systems

  10. MP OS Structure - 3 • Symmetric Configuration - Most Flexible. • all processors are autonomous, treated equal • one copy of the kernel executed concurrently across all processors • Synchronize access to shared data structures: • Lock entire OS - Floating Master • Mitigated by dividing OS into segments that normally have little interaction • multithread kernel and control access to resources (continuum) CS523S: Operating Systems

  11. MP Overview MultiProcessor SIMD MIMD Shared Memory (tightly coupled) Distributed Memory (loosely coupled) Symmetric (SMP) Clusters Master/Slave CS523S: Operating Systems

  12. SMP OS Design Issues • Threads - effectiveness of parallelism depends on performance of primitives used to express and control concurrency. • Process Synchronization - disabling interrupts is not sufficient. • Process Scheduling - efficient, policy controlled, task scheduling (process/threads) • global versus per CPU scheduling • Task affinity for a particular CPU • resource accounting and intra-task thread dependencies CS523S: Operating Systems

  13. SMP OS design issues - 2 • Memory Management - complicated since main memory is shared by possibly many processors. Each processor must maintain its own map tables for each process • cache coherence • memory access synchronization • balancing overhead with increased concurrency • Reliability and fault Tolerance - degrade gracefully in the event of failures CS523S: Operating Systems

  14. 50ns Main Memory Typical SMP System CPU CPU CPU CPU 500MHz cache MMU cache MMU cache MMU cache MMU System/Memory Bus • Issues: • Memory contention • Limited bus BW • I/O contention • Cache coherence I/O subsystem Bridge INT ether System Functions (timer, BIOS, reset) scsi • Typical I/O Bus: • 33MHz/32bit (132MB/s) • 66MHz/64bit (528MB/s) video CS523S: Operating Systems

  15. Some Definitions • Parallelism: degree to which a multiprocessor application achieves parallel execution • Concurrency: Maximum parallelism an application can achieve with unlimited processors • System Concurrency: kernel recognizes multiple threads of control in a program • User Concurrency: User space threads (coroutines) to provide a natural programming model for concurrent applications. Concurrency not supported by system. CS523S: Operating Systems

  16. Process and Threads • Process: encompasses • set of threads (computational entities) • collection of resources • Thread: Dynamic object representing an execution path and computational state. • threads have their own computional state: PC, stack, user registers and private data • Remaining resources are shared amongst threads in a process CS523S: Operating Systems

  17. Threads • Effectiveness of parallel computing depends on the performanceof the primitives used to express and control parallelism • Threads separate the notion of execution from the Process abstraction • Useful for expressing the intrinsic concurrency of a program regardless of resulting performance • Three types: User threads, kernel threads and Light Weight Processes (LWP) CS523S: Operating Systems

  18. User Level Threads • User level threads - supported by user level threads libraries • Benefits: • no modifications required to kernel • flexible and low cost • Drawbacks: • can not block without blocking entire process • no parallelism (not recognized by kernel) CS523S: Operating Systems

  19. Kernel Level Threads • Kernel level threads - kernel directly supports multiple threads of control in a process. Thread is the basic scheduling entity • Benefits: • coordination between scheduling and synchronization • less overhead than a process • suitable for parallel application • Drawbacks: • more expensive than user-level threads • generality leads to greater overhead CS523S: Operating Systems

  20. Light Weight Processes (LWP) • Kernel supported user thread • Each LWP is bound to one kernel thread. • a kernel thread may not be bound to an LWP • LWP is scheduled by kernel • User threads scheduled by library onto LWPs • Multiple LWPs per process CS523S: Operating Systems

  21. First Class threads (Psyche OS) • Thread operations in user space: • create, destroy, synch, context switch • kernel threads implement a virtual processor • Course grain in kernel - preemptive scheduling • Communication between kernel and threads library • shared data structures. • Software interrupts (user upcalls or signals). Example, for scheduling decisions and preemption warnings. • Kernel scheduler interface - allows dissimilar thread packages to coordinate. CS523S: Operating Systems

  22. Scheduler Activations • An activation: • serves as execution context for running thread • notifies thread of kernel events (upcall) • space for kernel to save processor context of current user thread when stopped by kernel • kernel is responsible for processor allocation => preemption by kernel. • Thread package responsible for scheduling threads on available processors (activations) CS523S: Operating Systems

  23. Support for Threading • BSD: • process model only. 4.4 BSD enhancements. • Solaris:provides • user threads, kernel threads and LWPs • Mach: supports • kernel threads and tasks. Thread libraries provide semantics of user threads, LWPs and kernel threads. • Digital UNIX: extends MACH to provide usual UNIX semantics. • Pthreads library. CS523S: Operating Systems

  24. Solaris Threads • Supports: • user threads (uthreads) via libthread and libpthread • LWPs, acts as a virtual CPU for user threads • kernel threads (kthread), every LWP is associated with one kthread, however a kthread may not have an LWP • interrupts as threads CS523S: Operating Systems

  25. Solaris kthreads • Fundamental scheduling/dispatching object • all kthreads share same virtual address space (the kernels) - cheap context switch • System threads - example STREAMS, callout • kthread_t, /usr/include/sys/thread.h • scheduling info, pointers for scheduler or sleep queues, pointer to klwp_t and proc_t CS523S: Operating Systems

  26. Solaris LWP • Bound to a kthread • LWP specific fields from proc are kept in klwp_t (/usr/include/sys/klwp.h) • user-level registers, system call params, resource usage, pointer to kthread_t and proc_t • klwp_t can be swapped with LWP • LWP non-swappable info kept in kthread_t CS523S: Operating Systems

  27. Solaris LWP (cont) • All LWPs in a process share: • signal handlers • Each may have its own • signal mask • alternate stack for signal handling • No global name space for LWPs CS523S: Operating Systems

  28. Solaris User Threads • Implemented in user libraries • library provides synchronization and scheduling facilities • threads may be bound to LWPs • unbound threads compete for available LWPs • Manage thread specific info • thread id, saved register state, user stack, signal mask, priority*, thread local storage • Solaris provides two libraries: libthread and libpthread. • Try man thread or man pthreads CS523S: Operating Systems

  29. Solaris Thread Data Structures proc_t p_tlist kthread_t t_procp t_lwp klwp_t t_forw lwp_thread lwp_procp CS523S: Operating Systems

  30. L L L L ... ... ... P P P Solaris: Processes, Threads and LWPs Process 2 Process 1 user Int kthr kernel hardware CS523S: Operating Systems

  31. Solaris Interrupts • One system wide clock kthread • pool of 9 partially initialized kthreads per CPU for interrupts • interrupt thread can block • interrupted thread is pinned to the CPU CS523S: Operating Systems

  32. Solaris Signals and Fork • Divided into Traps (synchronous) and interrupts (asynchronous) • each thread has its own signal mask, global set of signal handlers • Each LWP can specify alternate stack • fork replicates all LWPs • fork1 only the invoking LWP/thread CS523S: Operating Systems

  33. Mach • Two abstractions: • Task - static object, address space and system resources called port rights. • Thread - fundamental execution unit and runs in context of a task. • Zero or more threads per task, • kernel schedulable • kernel stack • computational state • Processor sets - available processors divided into non-intersecting sets. • permits dedicating processor sets to one or more tasks CS523S: Operating Systems

  34. Mach c-thread Implementations • Coroutine-based - multiples user threads onto a single-threaded task • Thread-based - one-to-one mapping from c-threads to Mach threads. Default. • Task-based - One Mach Task per c-thread. CS523S: Operating Systems

  35. Digital UNIX • Based on Mach 2.5 kernel • Provides complete UNIX programmers interface • 4.3BSD code and ULTRIX code ported to Mach • u-area replaced by utask and uthread • proc structure retained CS523S: Operating Systems

  36. Digital UNIX threads • Signals divided into synchronous and asynchronous • global signal mask • each thread can define its own handlers for synchronous signals • global handlers for asynchronous signals CS523S: Operating Systems

  37. Pthreads library • One Mach thread per pthread • implements asynchronous I/O • separate thread created for synchronous I/O which in turn signals original thread • library includes signal handling, scheduling functions, and synchronization primitives. CS523S: Operating Systems

  38. Mach Continuations • Address problem of excessive kernel stack memory requirements • process model versus interrupt model • one per process kernel stack versus a per thread kernel stack • Thread is first responsible for saving any required state (the thread structure allows up to 28 bytes) • indicate a function to be invoked when unblocked (the continuation function) • Advantage: stack can be transferred between threads eliminating copy overhead. CS523S: Operating Systems

  39. Threads in Windows NT • Design driven by need to support a variety of OS environments • NT process implemented as an object • executable process contains >= 1 thread • process and thread objects have built in synchronization capabilitiesS CS523S: Operating Systems

  40. NT Threads • Support for kernel (system) threads • Threads are scheduled by the kernel and thus are similar to UNIX threads bound to an LWP (kernel thread) • fibers are threads which are not scheduled by the kernel and thus are similar to unbound user threads. CS523S: Operating Systems

  41. 4.4 BSD UNIX • Initial support for threads implemented but not enabled in distribution • Proc structure and u-area reorganized • All threads have a unique ID • How are the proc and u areas reorganized to support threads? CS523S: Operating Systems

More Related