1 / 39

Multiprocessors and Threads

Multiprocessors and Threads. Fred Kuhns (fredk@arl.wustl.edu, http://www.arl.wustl.edu/~fredk) Department of Computer Science and Engineering Washington University in St. Louis. Motivation for Multiprocessors. Enhanced Performance -

shen
Download Presentation

Multiprocessors and Threads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiprocessors and Threads Fred Kuhns (fredk@arl.wustl.edu, http://www.arl.wustl.edu/~fredk) Department of Computer Science and Engineering Washington University in St. Louis

  2. Motivation for Multiprocessors • Enhanced Performance - • Concurrent execution of tasks for increased throughput (between processes) • Exploit Concurrency in Tasks (Parallelism within process) • Fault Tolerance - • graceful degradation in face of failures Cs422 – Operating Systems Organization

  3. Basic MP Architectures • Single Instruction Single Data (SISD) • conventional uniprocessor designs. • Single Instruction Multiple Data (SIMD) • Vector and Array Processors • Multiple Instruction Single Data (MISD) • Not Implemented. • Multiple Instruction Multiple Data (MIMD) • conventional MP designs Cs422 – Operating Systems Organization

  4. MIMD Classifications • Tightly Coupled System- all processors share the same global memory and have the same address spaces (Typical SMP system). • Main memory for IPC and Synchronization. • Loosely Coupled System - memory is partitioned and attached to each processor. Hypercube, Clusters (Multi-Computer). • Message passing for IPC and synchronization. Cs422 – Operating Systems Organization

  5. CPU CPU CPU CPU cache MMU cache MMU cache MMU cache MMU Interconnection Network MM MM MM MM MP Block Diagram Cs422 – Operating Systems Organization

  6. Memory Access Schemes • Uniform Memory Access (UMA) • Centrally located • All processors are equidistant (access times) • NonUniform Access (NUMA) • physically partitioned but accessible by all • processors have the same address space • NO Remote Memory Access (NORMA) • physically partitioned, not accessible by all • processors have own address space Cs422 – Operating Systems Organization

  7. Other Details of MP • Interconnection technology • Bus • Cross-Bar switch • Multistage Interconnect Network • Caching - Cache Coherence Problem! • Write-update • Write-invalidate • bus snooping Cs422 – Operating Systems Organization

  8. MP OS Structure - 1 • Separate Supervisor - • all processors have own copy of the kernel. • Some share data for interaction • dedicated I/O devices and file systems • good fault tolerance but bad for concurrency • Master/Slave Configuration • Master: monitors status and assigns work • Slaves: schedulable pool of resources • master can be bottleneck • poor fault tolerance Cs422 – Operating Systems Organization

  9. MP OS Structure - 2 • Symmetric Configuration - Most Flexible. • all processors are autonomous, treated equal • one copy of the kernel executed concurrently across all processors • Synchronized access to shared data structures: • Lock entire OS - Floating Master • Mitigated by dividing OS into segments that normally have little interaction • multithread kernel and control access to resources (continuum) Cs422 – Operating Systems Organization

  10. MP Overview MultiProcessor SIMD MIMD Shared Memory (tightly coupled) Distributed Memory (loosely coupled) Symmetric (SMP) Clusters Master/Slave Cs422 – Operating Systems Organization

  11. SMP OS Design Issues • Threads - effectiveness of parallelism depends on performance of primitives used to express and control concurrency. • Process Synchronization - disabling interrupts is not sufficient. • Process Scheduling - efficient, policy controlled, task scheduling. Issues: • Global versus Local (per CPU) • Task affinity for a particular CPU • resource accounting • inter-thread dependencies Cs422 – Operating Systems Organization

  12. SMP OS design issues - cont. • Memory Management - complication of shared main memory. • cache coherence • memory access synchronization • balancing overhead with increased concurrency • Reliability and fault Tolerance - degrade gracefully in the event of failures Cs422 – Operating Systems Organization

  13. Main Memory Typical SMP System CPU CPU CPU CPU 500MHz cache MMU cache MMU cache MMU cache MMU System/Memory Bus • Issues: • Memory contention • Limited bus BW • I/O contention • Cache coherence I/O subsystem 50ns Bridge INT ether System Functions (timer, BIOS, reset) scsi • Typical I/O Bus: • 33MHz/32bit (132MB/s) • 66MHz/64bit (528MB/s) video Cs422 – Operating Systems Organization

  14. Some Useful Definitions • Parallelism: degree to which a multiprocessor application achieves parallel execution • Concurrency: Maximum parallelism an application can achieve with unlimited processors • System Concurrency: kernel recognizes multiple threads of control in a program • User Concurrency: User space threads (coroutines) provide a natural programming model for concurrent applications. Cs422 – Operating Systems Organization

  15. Introduction to Threads Multithreaded Process Model Single-Threaded Process Model Thread Thread Thread Thread Control Block Thread Control Block Thread Control Block Process Control Block User Stack Process Control Block User Stack User Stack User Stack User Address Space Kernel Stack User Address Space Kernel Stack Kernel Stack Kernel Stack Cs422 – Operating Systems Organization

  16. Process Concept Embodies • Unit of Resource ownership - process is allocated a virtual address space to hold the process image • Unit of Dispatching- process is an execution path through one or more programs • execution may be interleaved with other processes • These two characteristics are treated independently by the operating system Cs422 – Operating Systems Organization

  17. Threads • Effectiveness of parallel computing depends on the performanceof the primitives used to express and control parallelism • Separate notion of execution from Process abstraction • Useful for expressing the intrinsic concurrency of a program regardless of resulting performance • We will discuss Two examples of threading: • User threads, • Kernel threads Cs422 – Operating Systems Organization

  18. Threads cont. • Thread : Dynamic object representing an execution path and computational state. • One or more threads per process, each having: • Execution state (running, ready, etc.) • Saved thread context when not running • Execution stack • Per-thread static storage for local variables • Shared access to process resources • all threads of a process share a common address space. Cs422 – Operating Systems Organization

  19. Thread States • Primary states: • Running, Ready and Blocked. • Operations to change state: • Spawn: new thread provided register context and stack pointer. • Block: event wait, save user registers, PC and stack pointer • Unblock: moved to ready state • Finish: deallocate register context and stacks. Cs422 – Operating Systems Organization

  20. P P P P P P Many-to-One One-to-One Many-to-Many Threading Models • User threads := Many-to-One • kernel threads := One-to-One • Mixed user and kernel := Many-to-Many Cs422 – Operating Systems Organization

  21. User Level Threads • User level threads - supported by user level threads libraries • Examples • POSIX Pthreads, Mach C-threads, Solaris ui-threads • Benefits: • no modifications required to kernel • flexible and low cost • Drawbacks: • can not block without blocking entire process • no parallelism (not recognized by kernel) Cs422 – Operating Systems Organization

  22. Kernel Level Threads • Kernel level threads - directly supported by kernel, thread is the basic scheduling entity • Examples: • Windows 95/98/NT/2000, Solaris, Tru64 UNIX, BeOS, Linux • Benefits: • coordination between scheduling and synchronization • less overhead than a process • suitable for parallel application • Drawbacks: • more expensive than user-level threads • generality leads to greater overhead Cs422 – Operating Systems Organization

  23. Threading Issues • fork and exec • should fork duplicate one, some or all threads • Cancellation – cancel the target thread, issues with freeing resources and inconsistent state • asynchronous cancellation – target is immediately canceled • deferred cancellation – target checks periodically. check at cancellation points Cs422 – Operating Systems Organization

  24. Threading Issues • Signals: generation, posting and delivery • Every signal handled by a default or user-defined handler • Signal delivery: • to thread for which it may apply • to every thread in process • to certain threads • specifically designated thread (signal thread) • synchronous signals should go to thread causing the signal • what about asynchronous signals? • Solaris: deliver to a special thread which forward to first user created thread that has not blocked the signal. Cs422 – Operating Systems Organization

  25. Threading Issues • Bounding the number of threads created in a dynamic environment • use thread pools • Al threads share some address space: • use of thread specific data Cs422 – Operating Systems Organization

  26. Pthreads • a POSIX standard (IEEE 1003.1c) API for thread creation and synchronization. • API specifies behavior of the thread library, not the implementation. • Common in UNIX operating systems. • Programs must include <pthread.h> Cs422 – Operating Systems Organization

  27. UNIX Support for Threading • BSD: • pthreads and similar user space implementations • process model only. 4.4 BSD enhancements. • BSD based OSes are adding support for threads • Solaris • user threads, kernel threads, LWPs and in 2.6 Scheduler Activations • Mach • kernel threads and tasks. Thread libraries provide semantics of user threads, LWPs and kernel threads. • Digital UNIX - extends MACH to provide usual UNIX semantics: Pthreads library. Cs422 – Operating Systems Organization

  28. Solaris Threads • Supports: • user threads (uthreads) via libthread and libpthread • LWPs, abstraction that acts as a virtual CPU for user threads. • LWP is bound to a kthread. • kernel threads (kthread), every LWP is associated with one kthread, however a kthread may not have an LWP • interrupts as threads Cs422 – Operating Systems Organization

  29. Solaris kthreads • Fundamental scheduling/dispatching object • all kthreads share same virtual address space (the kernels) - cheap context switch • System threads - example STREAMS, callout • kthread_t, /usr/include/sys/thread.h • scheduling info, pointers for scheduler or sleep queues, pointer to klwp_t and proc_t Cs422 – Operating Systems Organization

  30. Solaris LWP • Kernel provided mechanism to allow for both user and kernel thread implementation on one platform. • Bound to akthread • LWP data (see /usr/include/sys/klwp.h) • user-level registers, system call params, resource usage, pointer to kthread_t and proc_t • All LWPs in a process share: • signal handlers • Each may have its own • signal mask • alternate stack for signal handling • No global name space for LWPs Cs422 – Operating Systems Organization

  31. Solaris User Threads • Implemented in user libraries • library provides synchronization and scheduling facilities • threads may be bound to LWPs • unbound threads compete for available LWPs • Manage thread specific info • thread id, saved register state, user stack, signal mask, priority*, thread local storage • Solaris provides two libraries: libthread and libpthread. • Try man thread or man pthreads Cs422 – Operating Systems Organization

  32. Solaris Thread Data Structures proc_t p_tlist kthread_t t_procp t_lwp klwp_t t_forw lwp_thread lwp_procp Cs422 – Operating Systems Organization

  33. L L L L ... ... ... P P P Solaris Threading Model (Combined) Process 2 Process 1 user Int kthr kernel hardware Cs422 – Operating Systems Organization

  34. Solaris User Level Threads Stop Wakeup Runnable Continue Stop Stopped Sleeping Preempt Dispatch Stop Active Sleep Cs422 – Operating Systems Organization

  35. Solaris Lightweight Processes Timeslice or Preempt Stop Running Dispatch Wakeup Blocking System Call Runnable Stopped Continue Wakeup Stop Blocked Cs422 – Operating Systems Organization

  36. Solaris Interrupts • One system wide clock kthread • pool of 9 partially initialized kthreads per CPU for interrupts • interrupt thread can block • interrupted thread is pinned to the CPU Cs422 – Operating Systems Organization

  37. Solaris Signals and Fork • Divided into Traps (synchronous) and interrupts (asynchronous) • each thread has its own signal mask, global set of signal handlers • Each LWP can specify alternate stack • fork replicates all LWPs • fork1 only the invoking LWP/thread Cs422 – Operating Systems Organization

  38. Windows 2000 Threads • Implements the one-to-one mapping. • There is also support for user lever threads called fibers • Each thread contains - a thread id - register set - separate user and kernel stacks - private data storage area Cs422 – Operating Systems Organization

  39. Linux Threads • Linux refers to processes and threads tasks • Thread creation is done through clone() system call. • Clone() allows a child task to share the address space of the parent task (process) Cs422 – Operating Systems Organization

More Related